numam-dpdk

Author	SHA1	Message	Date
Anatoly Burakov	76b15480d6	malloc: enable callbacks on alloc/free and mp sync Callbacks will be triggered just after allocation and just before deallocation, to ensure that memory address space referenced in the callback is always valid by the time callback is called. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	56efb4c117	malloc: support callbacks on memory events Each process will have its own callbacks. Callbacks will indicate whether it's allocation and deallocation that's happened, and will also provide start VA address and length of allocated block. Since memory hotplug isn't supported on FreeBSD and in legacy mem mode, it will not be possible to register them in either. Callbacks are called whenever something happens to the memory map of current process, therefore at those times memory hotplug subsystem is write-locked, which leads to deadlocks on attempt to use these functions. Document the limitation. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	07dcbfe010	malloc: support multiprocess memory hotplug This enables multiprocess synchronization for memory hotplug requests at runtime (as opposed to initialization). Basic workflow is the following. Primary process always does initial mapping and unmapping, and secondary processes always follow primary page map. Only one allocation request can be active at any one time. When primary allocates memory, it ensures that all other processes have allocated the same set of hugepages successfully, otherwise any allocations made are being rolled back, and heap is freed back. Heap is locked throughout the process, and there is also a global memory hotplug lock, so no race conditions can happen. When primary frees memory, it frees the heap, deallocates affected pages, and notifies other processes of deallocations. Since heap is freed from that memory chunk, the area basically becomes invisible to other processes even if they happen to fail to unmap that specific set of pages, so it's completely safe to ignore results of sync requests. When secondary allocates memory, it does not do so by itself. Instead, it sends a request to primary process to try and allocate pages of specified size and on specified socket, such that a specified heap allocation request could complete. Primary process then sends all secondaries (including the requestor) a separate notification of allocated pages, and expects all secondary processes to report success before considering pages as "allocated". Only after primary process ensures that all memory has been successfully allocated in all secondary process, it will respond positively to the initial request, and let secondary proceed with the allocation. Since the heap now has memory that can satisfy allocation request, and it was locked all this time (so no other allocations could take place), secondary process will be able to allocate memory from the heap. When secondary frees memory, it hides pages to be deallocated from the heap. Then, it sends a deallocation request to primary process, so that it deallocates pages itself, and then sends a separate sync request to all other processes (including the requestor) to unmap the same pages. This way, even if secondary fails to notify other processes of this deallocation, that memory will become invisible to other processes, and will not be allocated from again. So, to summarize: address space will only become part of the heap if primary process can ensure that all other processes have allocated this memory successfully. If anything goes wrong, the worst thing that could happen is that a page will "leak" and will not be available to neither DPDK nor the system, as some process will still hold onto it. It's not an actual leak, as we can account for the page - it's just that none of the processes will be able to use this page for anything useful, until it gets allocated from by the primary. Due to underlying DPDK IPC implementation being single-threaded, some asynchronous magic had to be done, as we need to complete several requests before we can definitively allow secondary process to use allocated memory (namely, it has to be present in all other secondary processes before it can be used). Additionally, only one allocation request is allowed to be submitted at once. Memory allocation requests are only allowed when there are no secondary processes currently initializing. To enforce that, a shared rwlock is used, that is set to read lock on init (so that several secondaries could initialize concurrently), and write lock on making allocation requests (so that either secondary init will have to wait, or allocation request will have to wait until all processes have initialized). Any other function that wishes to iterate over memory or prevent allocations should be using memory hotplug lock. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	1403f87d4f	malloc: enable memory hotplug support This set of changes enables rte_malloc to allocate and free memory as needed. Currently, it is disabled because legacy mem mode is enabled unconditionally. The way it works is, first malloc checks if there is enough memory already allocated to satisfy user's request. If there isn't, we try and allocate more memory. The reverse happens with free - we free an element, check its size (including free element merging due to adjacency) and see if it's bigger than hugepage size and that its start and end span a hugepage or more. Then we remove the area from malloc heap (adjusting element lengths where appropriate), and deallocate the page. For legacy mode, runtime alloc/free of pages is disabled. It is worth noting that memseg lists are being sorted by page size, and that we try our best to satisfy user's request. That is, if the user requests an element from a 2MB page memory, we will check if we can satisfy that request from existing memory, if not we try and allocate more 2MB pages. If that fails and user also specified a "size is hint" flag, we then check other page sizes and try to allocate from there. If that fails too, then, depending on flags, we may try allocating from other sockets. In other words, we try our best to give the user what they asked for, but going to other sockets is last resort - first we try to allocate more memory on the same socket. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	6167d81488	mem: add secondary process init with memory hotplug Secondary initialization will just sync memory map with primary process. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	cb97d93e9d	mem: share hugepage info primary and secondary Since we are going to need to map hugepages in both primary and secondary processes, we need to know where we should look for hugetlbfs mountpoints. So, share those with secondary processes, and map them on init. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	524e43c2ad	mem: prepare memseg lists for multiprocess sync In preparation for implementing multiprocess support, we are adding a version number to memseg lists. We will not need any locks, because memory hotplug will have a global lock (so any time memory map and thus version number might change, we will already be holding a lock). There are two ways of implementing multiprocess support for memory hotplug: either all information about mapped memory is shared between processes, and secondary processes simply attempt to map/unmap memory based on requests from the primary, or secondary processes store their own maps and only check if they are in sync with the primary process' maps. This implementation will opt for the latter option: primary process shared mappings will be authoritative, and each secondary process will use its own interal view of mapped memory, and will attempt to synchronize on these mappings using versioning. Under this model, only primary process will decide which pages get mapped, and secondary processes will only copy primary's page maps and get notified of the changes via IPC mechanism (coming in later commits). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	c8f73de36e	mem: add function to check if memory is contiguous For now, memory is always contiguous because legacy mem mode is enabled unconditionally, but this function will be helpful down the line when we implement support for allocating physically non-contiguous memory. We can no longer guarantee physically contiguous memory unless we're in legacy or IOVA_AS_VA mode, but we can certainly try and see if we succeed. In addition, this would be useful for e.g. PMD's who may allocate chunks that are smaller than the pagesize, but they must not cross the page boundary, in which case we will be able to accommodate that request. This function will also support non-hugepage memory. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	2a04139f66	eal: add single file segments option Currently, DPDK stores all pages as separate files in hugetlbfs. This option will allow storing all pages in one file (one file per memseg list). We do this by using fallocate() calls on FreeBSD, however this is only supported on fairly recent (4.3+) kernels, so ftruncate() fallback is provided to grow (but not shrink) hugepage files. Naming scheme is deterministic, so both primary and secondary processes will be able to easily map needed files and offsets. For multi-file segments, we can close fd's right away. For single-file segments, we can reuse the same fd and reduce the amount of fd's needed to map/use hugepages. However, we need to store the fd's somewhere, so we add a tailq. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	a5ff05d60f	mem: support unmapping pages at runtime This isn't used anywhere yet, but the support is now there. Also, adding cleanup to allocation procedures, so that if we fail to allocate everything we asked for, we can free all of it back. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:57:20 +02:00
Anatoly Burakov	582bed1e1d	mem: support mapping hugepages at runtime Nothing uses this code yet. The bulk of it is copied from old memory allocation code (linuxapp eal_memory.c). We provide an EAL-internal API to allocate either one page or multiple pages, guaranteeing that we'll get contiguous VA for all of the pages that we requested. Not supported on FreeBSD. Locking is done via fcntl() because that way, when it comes to taking out write locks or unlocking on deallocation, we don't have to keep original fd's around. Plus, using fcntl() gives us ability to lock parts of a file, which is useful for single-file segments, which are coming down the line. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:56:37 +02:00
Anatoly Burakov	49df3db848	memzone: replace memzone array with fbarray It's there, so we might as well use it. Some operations will be sped up by that. Since we have to allocate an fbarray for memzones, we have to do it before we initialize memory subsystem, because that, in secondary processes, will (later) allocate more fbarrays than the primary process, which will result in inability to attach to memzone fbarray if we do it after the fact. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:56:30 +02:00
Anatoly Burakov	66cc45e293	mem: replace memseg with memseg lists Before, we were aggregating multiple pages into one memseg, so the number of memsegs was small. Now, each page gets its own memseg, so the list of memsegs is huge. To accommodate the new memseg list size and to keep the under-the-hood workings sane, the memseg list is now not just a single list, but multiple lists. To be precise, each hugepage size available on the system gets one or more memseg lists, per socket. In order to support dynamic memory allocation, we reserve all memory in advance (unless we're in 32-bit legacy mode, in which case we do not preallocate memory). As in, we do an anonymous mmap() of the entire maximum size of memory per hugepage size, per socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the smaller one), split over multiple lists (which are limited to either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST megabytes per list, whichever is the smaller one). There is also a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly used for 32-bit targets to limit amounts of preallocated memory, but can be used to place an upper limit on total amount of VA memory that can be allocated by DPDK application. So, for each hugepage size, we get (by default) up to 128G worth of memory, per socket, split into chunks of up to 32G in size. The address space is claimed at the start, in eal_common_memory.c. The actual page allocation code is in eal_memalloc.c (Linux-only), and largely consists of copied EAL memory init code. Pages in the list are also indexed by address. That is, in order to figure out where the page belongs, one can simply look at base address for a memseg list. Similarly, figuring out IOVA address of a memzone is a matter of finding the right memseg list, getting offset and dividing by page size to get the appropriate memseg. This commit also removes rte_eal_dump_physmem_layout() call, according to deprecation notice [1], and removes that deprecation notice as well. On 32-bit targets due to limited VA space, DPDK will no longer spread memory to different sockets like before. Instead, it will (by default) allocate all of the memory on socket where master lcore is. To override this behavior, --socket-mem must be used. The rest of the changes are really ripple effects from the memseg change - heap changes, compile fixes, and rewrites to support fbarray-backed memseg lists. Due to earlier switch to _walk() functions, most of the changes are simple fixes, however some of the _walk() calls were switched to memseg list walk, where it made sense to do so. Additionally, we are also switching locks from flock() to fcntl(). Down the line, we will be introducing single-file segments option, and we cannot use flock() locks to lock parts of the file. Therefore, we will use fcntl() locks for legacy mem as well, in case someone is unfortunate enough to accidentally start legacy mem primary process alongside an already working non-legacy mem-based primary process. [1] http://dpdk.org/dev/patchwork/patch/34002/ Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:39 +02:00
Anatoly Burakov	c44d09811b	eal: add shared indexed file-backed array rte_fbarray is a simple indexed array stored in shared memory via mapping files into memory. Rationale for its existence is the following: since we are going to map memory page-by-page, there could be quite a lot of memory segments to keep track of (for smaller page sizes, page count can easily reach thousands). We can't really make page lists truly dynamic and infinitely expandable, because that involves reallocating memory (which is a big no-no in multiprocess). What we can do instead is have a maximum capacity as something really, really large, and decide at allocation time how big the array is going to be. We map the entire file into memory, which makes it possible to use fbarray as shared memory, provided the structure itself is allocated in shared memory. Per-fbarray locking is also used to avoid index data races (but not contents data races - that is up to user application to synchronize). In addition, in understanding that we will frequently need to scan this array for free space and iterating over array linearly can become slow, rte_fbarray provides facilities to index array's usage. The following use cases are covered: - find next free/used slot (useful either for adding new elements to fbarray, or walking the list) - find starting index for next N free/used slots (useful for when we want to allocate chunk of VA-contiguous memory composed of several pages) - find how many contiguous free/used slots there are, starting from specified index (useful for when we want to figure out how many pages we have until next hole in allocated memory, to speed up some bulk operations where we would otherwise have to walk the array and add pages one by one) This is accomplished by storing a usage mask in-memory, right after the data section of the array, and using some bit-level magic to figure out the info we need. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:21 +02:00
Anatoly Burakov	182cf0c28d	eal: add legacy memory option This adds a "--legacy-mem" command-line switch. It will be used to go back to the old memory behavior, one where we can't dynamically allocate/free memory (the downside), but one where the user can get physically contiguous memory, like before (the upside). For now, nothing but the legacy behavior exists, non-legacy memory init sequence will be added later. For FreeBSD, non-legacy memory init will never be enabled, while for Linux, it is disabled in this patch to avoid breaking bisect, but will be enabled once non-legacy mode will be fully operational. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:13 +02:00
Anatoly Burakov	73a6390859	vfio: allow to map other memory regions Currently it is not possible to use memory that is not owned by DPDK to perform DMA. This scenarion might be used in vhost applications (like SPDK) where guest send its own memory table. To fill this gap provide API to allow registering arbitrary address in VFIO container. Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:10 +02:00
Anatoly Burakov	aa6a098a8f	memzone: use walk instead of iteration for dumping Simplify memzone dump code to use memzone walk, to not maintain the same memzone iteration code twice. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:05 +02:00
Anatoly Burakov	f901e64d21	mem: add virt2memseg function This can be used as a virt2iova function that only looks up memory that is owned by DPDK (as opposed to doing pagemap walks). Using this will result in less dependency on internals of mem API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:54:44 +02:00
Anatoly Burakov	eca28edd98	mem: add iova2virt function This is reverse lookup of PA to VA. Using this will make other code less dependent on internals of mem API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:54:00 +02:00
Anatoly Burakov	552afc420a	mem: add contig walk function This function is meant to walk over first segment of each VA-contiguous group of memsegs. For future users of this function, this is done so that there is less dependency on internals of mem API and less noise later change sets. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:53:38 +02:00
Anatoly Burakov	221b67bca0	eal: use memseg walk instead of iteration Reduce dependency on internal details of EAL memory subsystem, and simplify code. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:48:15 +02:00
Anatoly Burakov	2b9f98d8a5	mem: add function to walk all memsegs For code that might need to iterate over list of allocated segments, using this API will make it more resilient to internal API changes and will prevent copying the same iteration code over and over again. Additionally, down the line there will be locking implemented, so users of this API will not need to care about locking either. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:47:25 +02:00
Anatoly Burakov	23fa86e529	memzone: enable IOVA-contiguous reserving This adds a new flag to request reserved memzone to be IOVA contiguous. This is useful for allocating hardware resources like NIC rings/queues etc.For now, hugepage memory is always contiguous, but we need to prepare the drivers for the switch. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:44:05 +02:00
Anatoly Burakov	5ea85289a9	malloc: support contiguous allocation No major changes, just add some checks in a few key places, and a new parameter to pass around. Also, add a function to check malloc element for physical contiguousness. For now, assume hugepage memory is always contiguous, while non-hugepage memory will be checked. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:55 +02:00
Anatoly Burakov	d1162b77c9	malloc: replace panics with error messages We shouldn't ever panic in libraries, let alone in EAL, so replace all panic messages with error messages. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:50 +02:00
Anatoly Burakov	883179b493	malloc: make free return resulting element This will be needed because we need to know how big is the new empty space, to check whether we can free some pages as a result. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:41 +02:00
Anatoly Burakov	0a59238f80	malloc: make free list removal function public We will need to be able to remove entries from free lists from heaps during certain events, such as rollbacks, or when freeing memory to the system (where a previously element disappears and thus can no longer be in the free list). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:41:39 +02:00
Anatoly Burakov	f21aa4ec9d	malloc: make join elements function public Down the line, we will need to join free segments to determine whether the resulting contiguous free space is bigger than a page size, allowing to free some memory back to the system. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:38:08 +02:00
Anatoly Burakov	30bc6bf0d5	malloc: add function to dump heap contents Malloc heap is now a doubly linked list, so it's now possible to iterate over each malloc element regardless of its state. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:53 +02:00
Anatoly Burakov	bb372060da	malloc: make heap a doubly-linked list As we are preparing for dynamic memory allocation, we need to be able to handle holes in our malloc heap, hence we're switching to doubly linked list, and prepare infrastructure to support it. Since our heap is now aware where are our first and last elements, there is no longer any need to have a dummy element at the end of each heap, so get rid of that as well. Instead, let insert/remove/ join/split operations handle end-of-list conditions automatically. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:46 +02:00
Anatoly Burakov	b5dd92226f	malloc: move all locking to heap Down the line, we will need to do everything from the heap as any alloc or free may trigger alloc/free OS memory, which would involve growing/shrinking heap. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:39 +02:00
Anatoly Burakov	b7cc54187e	mem: move virtual area function in common directory Move get_virtual_area out of linuxapp EAL memory and make it common to EAL, so that other code could reserve virtual areas as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:33:06 +02:00
Shahaf Shuler	5feecc57d9	align SPDX Mellanox copyrights Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-11 01:47:47 +02:00
Jan Viktorin	07cbe8f27d	eal/arm: use SPDX tag for Cavium and RehiveTech copyright file Replace the BSD license header with the SPDX tag for files with a RehiveTech and Cavium copyright on them. Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-11 01:47:46 +02:00
Jan Viktorin	27d8b82635	use SPDX tag for RehiveTech copyright files Replace the BSD license header with the SPDX tag for files with only an RehiveTech copyright on them. Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-04-11 01:47:43 +02:00
Anatoly Burakov	952b207772	eal: provide API for querying valid socket ids During lcore scan, find all socket ID's and store them, and provide public API to query valid socket id's. This will break the ABI, so bump ABI version. Also, remove deprecation notice corresponding to this change. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-05 00:27:13 +02:00
Anatoly Burakov	f05e26051c	eal: add IPC asynchronous request This API is similar to the blocking API that is already present, but reply will be received in a separate callback by the caller (callback specified at the time of request, rather than registering for it in advance). Under the hood, we create a separate thread to deal with replies to asynchronous requests, that will just wait to be notified by the main thread, or woken up on a timer. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:47:59 +02:00
Anatoly Burakov	ce3a731235	eal: rename IPC request as synchronous one Rename rte_mp_request to rte_mp_request_sync to indicate that this request will be done synchronously (as opposed to asynchronous request, which comes in next patch). Also, fix alphabetical ordering for .map file. Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:32:21 +02:00
Anatoly Burakov	0891faf5d8	eal: rename IPC sync request to pending request Originally, there was only one type of request which was used for multiprocess synchronization (hence the name - sync request). However, now that we are going to have two types of requests, synchronous and asynchronous, having it named "sync request" is very confusing, so we will rename it to "pending request". This is internal-only, so no externally visible API changes. Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:30:32 +02:00
Bruce Richardson	c022cb400e	convert snprintf to strlcpy Since we have support for the strlcpy function in DPDK, replace all instances where a string is copied using snprintf. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 17:33:08 +02:00
Bruce Richardson	5364de644a	eal: support strlcpy function The strncpy function is error prone for doing "safe" string copies, so we generally try to use "snprintf" instead in the code. The function "strlcpy" is a better alternative, since it better conveys the intention of the programmer, and doesn't suffer from the non-null terminating behaviour of it's n'ed brethern. The downside of this function is that it is not available by default on linux, though standard in the BSD's. It is available on most distros by installing "libbsd" package. This patch therefore provides the following in rte_string_fns.h to ensure that strlcpy is available there: * for BSD, include string.h as normal * if RTE_USE_LIBBSD is set, include <bsd/string.h> * if not set, fallback to snprintf for strlcpy Using make build system, the RTE_USE_LIBBSD is a hard-coded value to "n", but when using meson, it's automatically set based on what is available on the platform. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 17:33:08 +02:00
Pavan Nikhilesh	08f683174e	eal: add functions for previous power of 2 alignment Add 32b and 64b API's to align the given integer to the previous power of 2. Update common auto test to include test for previous power of 2 for both 32 and 64bit integers. Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-04 17:33:08 +02:00
Pavan Nikhilesh	5120203d75	eal: add macros to align value to multiple Add macros to align given value to the multiple of the supplied integer. Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-04 13:43:34 +02:00
Ivan Malov	b22e77c026	eal: register log type and pick level from args Dynamic log types are registered on RTE_INIT() step. This allows one to set log levels by EAL options on application launch. However, this does not allow to manage log types if they are created during runtime. EAL does not store log levels and types passed from the command line. Thus, they cannot be picked later. This is an obvious flaw since it would be better to be able to pick levels for dynamic types registered for runtime-determined facilities such as NIC ports. This patch provides a mechanism to store log levels passed from EAL options and adds an API to register log types and pick levels from the internal storage. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-30 14:08:44 +02:00
Stephen Hemminger	ff2863570f	eal: introduce atomic exchange operation To handle atomic update of link status (64 bit), every driver was doing its own version using cmpset. Atomic exchange is a useful primitive in its own right; therefore make it a EAL routine. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-03-30 14:08:43 +02:00
Anatoly Burakov	4fc90035af	vfio: fix headers for C++ support Fixes: `279b581c89` ("vfio: expose functions") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-03-21 18:49:37 +01:00
Anatoly Burakov	579a4ccc34	eal: ignore IPC messages until init is complete If we receive messages that don't have a callback registered for them, and we haven't finished initialization yet, it can be reasonably inferred that we shouldn't have gotten the message in the first place. Therefore, send requester a special message telling them to ignore response to this request, as if this process wasn't there. Since it is not possible for primary process to receive any messages during initialization, this change in practice only applies to secondary processes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	37e945d187	eal: simplify IPC sync request timeout Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	89f1fe7e6d	eal: lock IPC directory on init and send When sending IPC messages, prevent new sockets from initializing. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	a8075ad61e	eal: do not hardcode socket filter value in IPC Currently, filter value is hardcoded and disconnected from actual value returned by eal_mp_socket_path(). Fix this to generate filter value by deriving it from eal_mp_socket_path() instead. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00

1 2 3 4 5 ...

956 Commits