numam-dpdk

Author	SHA1	Message	Date
Xiao Wang	ea2dc10668	vfio: add multi container support This patch adds APIs to support container create/destroy and device bind/unbind with a container. It also provides API for IOMMU programing on a specified container. A driver could use "rte_vfio_container_create" helper to create a new container from eal, use "rte_vfio_container_group_bind" to bind a device to the newly created container. During rte_vfio_setup_device the container bound with the device will be used for IOMMU setup. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 15:54:55 +01:00
Thomas Monjalon	a5c9b9278c	eal: fix build on FreeBSD The auxiliary vector read is implemented only for Linux. It could be done with procstat_getauxv() for FreeBSD. Since the commit below, the auxiliary vector functions are compiled for every architectures, including x86 which is tested with FreeBSD. This patch is moving the Linux implementation in Linux directory, and adding a fake/empty implementation for FreeBSD. Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 11:13:59 +02:00
Olivier Matz	dec7b1884a	use sizeof to avoid double use of a length define Only a cosmetic change: the *_LEN defines are already used when defining the buffer. Using sizeof() ensures that the length stays consistent, even if the definition is modified. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Jianfeng Tan	79967252c3	eal: bring forward multi-process channel init Adjust the init sequence: put mp channel init before bus scan so that we can init the vdev bus through mp channel in the secondary process before the bus scan. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>	2018-04-24 12:31:26 +02:00
Yangchao Zhou	fb338b80e5	mem: fix leaks of hugedir and replace snprintf The hugedir returned by get_hugepage_dir is allocated by strdup but not released. Replace snprintf with a more suitable strlcpy. Coverity issue: 272585 Fixes: cb97d93e9d3b ("mem: share hugepage info primary and secondary") Signed-off-by: Yangchao Zhou <zhouyates@gmail.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-18 10:58:10 +02:00
Anatoly Burakov	6e8a721044	vfio: export functions even when disabled Previously, VFIO functions were not compiled in and exported if VFIO compilation was disabled. Fix this by actually compiling all of the functions unconditionally, and provide missing prototypes on Linux. Fixes: 279b581c897d ("vfio: expose functions") Fixes: 73a639085938 ("vfio: allow to map other memory regions") Fixes: 964b2f3bfb07 ("vfio: export some internal functions") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-16 19:33:46 +02:00
Jeff Guo	a753e53d51	eal: add device event monitor framework This patch aims to add a general device event monitor framework at EAL device layer, for device hotplug awareness and actions adopted accordingly. It could also expand for all other types of device event monitor, but not in this scope at the stage. To get started, users firstly call below new added APIs to enable/disable the device event monitor mechanism: - rte_dev_event_monitor_start - rte_dev_event_monitor_stop Then users shell register or unregister callbacks through the new added APIs. Callbacks can be some device specific, or for all devices. -rte_dev_event_callback_register -rte_dev_event_callback_unregister Use hotplug case for example, when device hotplug insertion or hotplug removal, we will get notified from kernel, then call user's callbacks accordingly to handle it, such as detach or attach the device from the bus, and could benefit further fail-safe or live-migration. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 12:00:31 +02:00
Hemant Agrawal	964b2f3bfb	vfio: export some internal functions This patch moves some of the internal vfio functions from eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix. This patch also change the FSLMC bus usages from the internal VFIO functions to external ones with "rte_" prefix Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-13 01:06:57 +02:00
Neil Horman	34fbfa585c	mem: set fd to -1 for anonymous mmap https://dpdk.org/tracker/show_bug.cgi?id=18 Indicated that several mmap call sites in the [linux\|bsd]app eal code set fd that was not -1 in their calls while using MAP_ANONYMOUS. While probably not a huge deal, the man page does say the fd should be -1 for portability, as some implementations don't ignore fd as they should for MAP_ANONYMOUS. Suggested-by: Solal Pirelli <solal.pirelli@gmail.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-12 14:44:24 +02:00
Anatoly Burakov	07dcbfe010	malloc: support multiprocess memory hotplug This enables multiprocess synchronization for memory hotplug requests at runtime (as opposed to initialization). Basic workflow is the following. Primary process always does initial mapping and unmapping, and secondary processes always follow primary page map. Only one allocation request can be active at any one time. When primary allocates memory, it ensures that all other processes have allocated the same set of hugepages successfully, otherwise any allocations made are being rolled back, and heap is freed back. Heap is locked throughout the process, and there is also a global memory hotplug lock, so no race conditions can happen. When primary frees memory, it frees the heap, deallocates affected pages, and notifies other processes of deallocations. Since heap is freed from that memory chunk, the area basically becomes invisible to other processes even if they happen to fail to unmap that specific set of pages, so it's completely safe to ignore results of sync requests. When secondary allocates memory, it does not do so by itself. Instead, it sends a request to primary process to try and allocate pages of specified size and on specified socket, such that a specified heap allocation request could complete. Primary process then sends all secondaries (including the requestor) a separate notification of allocated pages, and expects all secondary processes to report success before considering pages as "allocated". Only after primary process ensures that all memory has been successfully allocated in all secondary process, it will respond positively to the initial request, and let secondary proceed with the allocation. Since the heap now has memory that can satisfy allocation request, and it was locked all this time (so no other allocations could take place), secondary process will be able to allocate memory from the heap. When secondary frees memory, it hides pages to be deallocated from the heap. Then, it sends a deallocation request to primary process, so that it deallocates pages itself, and then sends a separate sync request to all other processes (including the requestor) to unmap the same pages. This way, even if secondary fails to notify other processes of this deallocation, that memory will become invisible to other processes, and will not be allocated from again. So, to summarize: address space will only become part of the heap if primary process can ensure that all other processes have allocated this memory successfully. If anything goes wrong, the worst thing that could happen is that a page will "leak" and will not be available to neither DPDK nor the system, as some process will still hold onto it. It's not an actual leak, as we can account for the page - it's just that none of the processes will be able to use this page for anything useful, until it gets allocated from by the primary. Due to underlying DPDK IPC implementation being single-threaded, some asynchronous magic had to be done, as we need to complete several requests before we can definitively allow secondary process to use allocated memory (namely, it has to be present in all other secondary processes before it can be used). Additionally, only one allocation request is allowed to be submitted at once. Memory allocation requests are only allowed when there are no secondary processes currently initializing. To enforce that, a shared rwlock is used, that is set to read lock on init (so that several secondaries could initialize concurrently), and write lock on making allocation requests (so that either secondary init will have to wait, or allocation request will have to wait until all processes have initialized). Any other function that wishes to iterate over memory or prevent allocations should be using memory hotplug lock. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	cb97d93e9d	mem: share hugepage info primary and secondary Since we are going to need to map hugepages in both primary and secondary processes, we need to know where we should look for hugetlbfs mountpoints. So, share those with secondary processes, and map them on init. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	524e43c2ad	mem: prepare memseg lists for multiprocess sync In preparation for implementing multiprocess support, we are adding a version number to memseg lists. We will not need any locks, because memory hotplug will have a global lock (so any time memory map and thus version number might change, we will already be holding a lock). There are two ways of implementing multiprocess support for memory hotplug: either all information about mapped memory is shared between processes, and secondary processes simply attempt to map/unmap memory based on requests from the primary, or secondary processes store their own maps and only check if they are in sync with the primary process' maps. This implementation will opt for the latter option: primary process shared mappings will be authoritative, and each secondary process will use its own interal view of mapped memory, and will attempt to synchronize on these mappings using versioning. Under this model, only primary process will decide which pages get mapped, and secondary processes will only copy primary's page maps and get notified of the changes via IPC mechanism (coming in later commits). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	c8f73de36e	mem: add function to check if memory is contiguous For now, memory is always contiguous because legacy mem mode is enabled unconditionally, but this function will be helpful down the line when we implement support for allocating physically non-contiguous memory. We can no longer guarantee physically contiguous memory unless we're in legacy or IOVA_AS_VA mode, but we can certainly try and see if we succeed. In addition, this would be useful for e.g. PMD's who may allocate chunks that are smaller than the pagesize, but they must not cross the page boundary, in which case we will be able to accommodate that request. This function will also support non-hugepage memory. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	a5ff05d60f	mem: support unmapping pages at runtime This isn't used anywhere yet, but the support is now there. Also, adding cleanup to allocation procedures, so that if we fail to allocate everything we asked for, we can free all of it back. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:57:20 +02:00
Anatoly Burakov	582bed1e1d	mem: support mapping hugepages at runtime Nothing uses this code yet. The bulk of it is copied from old memory allocation code (linuxapp eal_memory.c). We provide an EAL-internal API to allocate either one page or multiple pages, guaranteeing that we'll get contiguous VA for all of the pages that we requested. Not supported on FreeBSD. Locking is done via fcntl() because that way, when it comes to taking out write locks or unlocking on deallocation, we don't have to keep original fd's around. Plus, using fcntl() gives us ability to lock parts of a file, which is useful for single-file segments, which are coming down the line. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:56:37 +02:00
Anatoly Burakov	49df3db848	memzone: replace memzone array with fbarray It's there, so we might as well use it. Some operations will be sped up by that. Since we have to allocate an fbarray for memzones, we have to do it before we initialize memory subsystem, because that, in secondary processes, will (later) allocate more fbarrays than the primary process, which will result in inability to attach to memzone fbarray if we do it after the fact. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:56:30 +02:00
Anatoly Burakov	66cc45e293	mem: replace memseg with memseg lists Before, we were aggregating multiple pages into one memseg, so the number of memsegs was small. Now, each page gets its own memseg, so the list of memsegs is huge. To accommodate the new memseg list size and to keep the under-the-hood workings sane, the memseg list is now not just a single list, but multiple lists. To be precise, each hugepage size available on the system gets one or more memseg lists, per socket. In order to support dynamic memory allocation, we reserve all memory in advance (unless we're in 32-bit legacy mode, in which case we do not preallocate memory). As in, we do an anonymous mmap() of the entire maximum size of memory per hugepage size, per socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the smaller one), split over multiple lists (which are limited to either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST megabytes per list, whichever is the smaller one). There is also a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly used for 32-bit targets to limit amounts of preallocated memory, but can be used to place an upper limit on total amount of VA memory that can be allocated by DPDK application. So, for each hugepage size, we get (by default) up to 128G worth of memory, per socket, split into chunks of up to 32G in size. The address space is claimed at the start, in eal_common_memory.c. The actual page allocation code is in eal_memalloc.c (Linux-only), and largely consists of copied EAL memory init code. Pages in the list are also indexed by address. That is, in order to figure out where the page belongs, one can simply look at base address for a memseg list. Similarly, figuring out IOVA address of a memzone is a matter of finding the right memseg list, getting offset and dividing by page size to get the appropriate memseg. This commit also removes rte_eal_dump_physmem_layout() call, according to deprecation notice [1], and removes that deprecation notice as well. On 32-bit targets due to limited VA space, DPDK will no longer spread memory to different sockets like before. Instead, it will (by default) allocate all of the memory on socket where master lcore is. To override this behavior, --socket-mem must be used. The rest of the changes are really ripple effects from the memseg change - heap changes, compile fixes, and rewrites to support fbarray-backed memseg lists. Due to earlier switch to _walk() functions, most of the changes are simple fixes, however some of the _walk() calls were switched to memseg list walk, where it made sense to do so. Additionally, we are also switching locks from flock() to fcntl(). Down the line, we will be introducing single-file segments option, and we cannot use flock() locks to lock parts of the file. Therefore, we will use fcntl() locks for legacy mem as well, in case someone is unfortunate enough to accidentally start legacy mem primary process alongside an already working non-legacy mem-based primary process. [1] http://dpdk.org/dev/patchwork/patch/34002/ Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:39 +02:00
Anatoly Burakov	c44d09811b	eal: add shared indexed file-backed array rte_fbarray is a simple indexed array stored in shared memory via mapping files into memory. Rationale for its existence is the following: since we are going to map memory page-by-page, there could be quite a lot of memory segments to keep track of (for smaller page sizes, page count can easily reach thousands). We can't really make page lists truly dynamic and infinitely expandable, because that involves reallocating memory (which is a big no-no in multiprocess). What we can do instead is have a maximum capacity as something really, really large, and decide at allocation time how big the array is going to be. We map the entire file into memory, which makes it possible to use fbarray as shared memory, provided the structure itself is allocated in shared memory. Per-fbarray locking is also used to avoid index data races (but not contents data races - that is up to user application to synchronize). In addition, in understanding that we will frequently need to scan this array for free space and iterating over array linearly can become slow, rte_fbarray provides facilities to index array's usage. The following use cases are covered: - find next free/used slot (useful either for adding new elements to fbarray, or walking the list) - find starting index for next N free/used slots (useful for when we want to allocate chunk of VA-contiguous memory composed of several pages) - find how many contiguous free/used slots there are, starting from specified index (useful for when we want to figure out how many pages we have until next hole in allocated memory, to speed up some bulk operations where we would otherwise have to walk the array and add pages one by one) This is accomplished by storing a usage mask in-memory, right after the data section of the array, and using some bit-level magic to figure out the info we need. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:21 +02:00
Anatoly Burakov	182cf0c28d	eal: add legacy memory option This adds a "--legacy-mem" command-line switch. It will be used to go back to the old memory behavior, one where we can't dynamically allocate/free memory (the downside), but one where the user can get physically contiguous memory, like before (the upside). For now, nothing but the legacy behavior exists, non-legacy memory init sequence will be added later. For FreeBSD, non-legacy memory init will never be enabled, while for Linux, it is disabled in this patch to avoid breaking bisect, but will be enabled once non-legacy mode will be fully operational. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:13 +02:00
Anatoly Burakov	73a6390859	vfio: allow to map other memory regions Currently it is not possible to use memory that is not owned by DPDK to perform DMA. This scenarion might be used in vhost applications (like SPDK) where guest send its own memory table. To fill this gap provide API to allow registering arbitrary address in VFIO container. Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:10 +02:00
Anatoly Burakov	221b67bca0	eal: use memseg walk instead of iteration Reduce dependency on internal details of EAL memory subsystem, and simplify code. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:48:15 +02:00
Anatoly Burakov	952b207772	eal: provide API for querying valid socket ids During lcore scan, find all socket ID's and store them, and provide public API to query valid socket id's. This will break the ABI, so bump ABI version. Also, remove deprecation notice corresponding to this change. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-05 00:27:13 +02:00
Hemant Agrawal	acaa9ee991	move kernel modules directories This patch moves the kernel modules code from EAL to a common place. - Separate the kernel module code from user space code. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-21 23:04:21 +01:00
Bruce Richardson	2f90543f23	eal/bsd: fix kernel modules build with meson The kernel module source file directory passed via VPATH was wrong, which caused the source files to be not found via make. Rather than explicitly passing VPATH, make use of the fact that the full path to the source files is passed by meson, so split that into directory part - to be used as VPATH - and file part - to be used as the source filename. Fixes: 610beca42ea4 ("build: remove library special cases") Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-02-02 11:28:52 +01:00
Nipun Gupta	028e4b1dbc	mbuf: fix logic of user mempool ops API The existing rte_eal_mbuf_default mempool ops can return the compile time default ops name if the user has not provided command line inputs for mempool ops name. It will break the logic of best mempool ops as it will never return platform hw mempool ops. This patch introduces a new API to just return the user mempool ops only. Fixes: 8b0f7f434132 ("mbuf: maintain user and compile time mempool ops name") Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-02-06 01:02:12 +01:00
Olivier Matz	5c7472135b	eal: use SPDX tags in 6WIND copyrighted files Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-02-01 02:32:41 +01:00
Pavan Nikhilesh	fe06cb6c54	eal: fix default mempool ops If '--mbuf-pool-ops' is not passed to EAL as command line argument then rte_eal_mbuf_default_mempool_ops will return NULL. Instead check if internal_config.user_mbuf_pool_ops_name is NULL and return compile time RTE_MBUF_DEFAULT_MEMPOOL_OPS. Fixes: 8b0f7f43413 ("mbuf: maintain user and compile time mempool ops name") Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-01-31 01:00:16 +01:00
Bruce Richardson	6c9457c279	build: replace license text with SPDX tag Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Luca Boccassi <bluca@debian.org>	2018-01-30 21:58:59 +01:00
Bruce Richardson	610beca42e	build: remove library special cases The EAL and compat libraries were special-cases in the library build process, the former because of it's complexity, and the latter because it only consists of a single header file. By reworking the EAL meson.build files, we can eliminate the need for it to be a special case, by having it build up and return the list of sources, headers, and objects and return those to the higher level build file. This should also simplify the building of EAL, as we can eliminate a number of meson.build files that would no longer be needed, and have fewer, but larger meson.build files (9 now vs 14 previous) - thereby making the logic easier to follow and items easier to find. Once done, we can pull eal into the main library loop, with some modifications to support it. Compat can also be pulled it once we add in a check to handle the case of an empty sources list. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-01-30 21:58:59 +01:00
Bruce Richardson	90434f6c2f	eal/bsd: build modules with meson Support compiling the FreeBSD kernel modules using meson and ninja. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-01-30 21:58:59 +01:00
Bruce Richardson	7b67398e60	build: add option to version libs using DPDK version Normally, each library has it's own version number based on the ABI. Add an option to have all libs just use the DPDK version number as the .so version. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Luca Boccassi <luca.boccassi@gmail.com>	2018-01-30 17:49:16 +01:00
Bruce Richardson	844514c735	eal: build with meson Support building the EAL with meson and ninja. This involves a number of different meson.build files for iterating through all the different subdirectories in the EAL. The library itself will be compiled on build but the header files are only copied from their initial location once "ninja install" is run. Instead, we use meson dependency tracking to ensure that other libraries which use the EAL headers can find them in their original locations. Note: this does not include building kernel modules on either BSD or Linux Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Luca Boccassi <luca.boccassi@gmail.com>	2018-01-30 17:49:16 +01:00
Jianfeng Tan	bacaa27540	eal: add channel for multi-process communication Previouly, there are three channels for multi-process (i.e., primary/secondary) communication. 1. Config-file based channel, in which, the primary process writes info into a pre-defined config file, and the secondary process reads the info out. 2. vfio submodule has its own channel based on unix socket for the secondary process to get container fd and group fd from the primary process. 3. pdump submodule also has its own channel based on unix socket for packet dump. It'd be good to have a generic communication channel for multi-process communication to accommodate the requirements including: a. Secondary wants to send info to primary, for example, secondary would like to send request (about some specific vdev to primary). b. Sending info at any time, instead of just initialization time. c. Share FDs with the other side, for vdev like vhost, related FDs (memory region, kick) should be shared. d. A send message request needs the other side to response immediately. This patch proposes to create a communication channel, based on datagram unix socket, for above requirements. Each process will block on a unix socket waiting for messages from the peers. Three new APIs are added: 1. rte_eal_mp_action_register() is used to register an action, indexed by a string, when a component at receiver side would like to response the messages from the peer processe. 2. rte_eal_mp_action_unregister() is used to unregister the action if the calling component does not want to response the messages. 3. rte_eal_mp_sendmsg() is used to send a message, and returns immediately. If there are n secondary processes, the primary process will send n messages. Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-01-30 15:09:42 +01:00
Neil Horman	a6ec31597a	mk: add experimental tag check Add checks during build to ensure that all symbols in the EXPERIMENTAL version map section have __experimental tags on their definitions, and enable the warnings needed to announce their use. Also add an ALLOW_EXPERIMENTAL_APIS define to allow individual libraries and files to declare the acceptability of experimental api usage Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-01-29 23:35:29 +01:00
Neil Horman	77b7b81e32	add experimental tag to appropriate functions Append the __rte_experimental tag to api calls appearing in the EXPERIMENTAL section of their libraries version map Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-01-29 23:35:29 +01:00
Harry van Haaren	aec9c13c52	eal: add function to release internal resources This commit adds a new function rte_eal_cleanup(). The function serves as a hook to allow DPDK to release internal resources (e.g.: hugepage allocations). This function allows DPDK to become more like an ordinary library, where the library context itself can be initialized and cleaned up by the application. The rte_exit() and rte_panic() functions must be considered, particularly if they should call rte_eal_cleanup() to release any resources or not. This patch adds the cleanup to rte_exit(), but does not clean up on rte_panic(). The reason to not clean up on panicing is that the developer may wish to inspect the exact internal state of EAL and hugepages. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Vipin Varghese <vipin.varghese@intel.com>	2018-01-29 20:33:53 +01:00
Hemant Agrawal	96fd032ba8	eal: prefix mbuf pool ops name with user defined This patch prefix the mbuf pool ops name with "user" to indicate that it is user defined. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>	2018-01-29 18:52:07 +01:00
Pavan Nikhilesh	0b037e8b02	eal: introduce integer divide through reciprocal In some use cases of integer division, denominator remains constant and numerator varies. It is possible to optimize division for such specific scenarios. The librte_sched uses rte_reciprocal to optimize division so, moving it to eal/common would allow other libraries and applications to use it. Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-01-27 22:34:33 +01:00
Moti Haimovsky	6817219581	vfio: fix FreeBSD build This patch fixes the following compilation errors in bsdapp lib/librte_eal/bsdapp/eal/eal.c:782:5: error: no previous prototype for function 'rte_vfio_clear_group' int rte_vfio_clear_group(int vfio_group_fd) ^ lib/librte_eal/bsdapp/eal/eal.c:782:30: error: unused parameter 'vfio_group_fd' int rte_vfio_clear_group(int vfio_group_fd) ^ Fixes: c564a2a20093 ("vfio: expose clear group function for internal usages") Signed-off-by: Moti Haimovsky <motih@mellanox.com>	2018-01-17 18:49:38 +01:00
Hemant Agrawal	c564a2a200	vfio: expose clear group function for internal usages other vfio based module e.g. fslmc will also need to use the clear_group call. So, exposing it and renaming it to rte_vfio_clear_group Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-01-17 00:43:04 +01:00
Thomas Monjalon	8f40ee0734	eal/x86: get hypervisor name The CPUID instruction is caught by hypervisor which can return a flag indicating one is running, and its name. Suggested-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-01-12 00:39:14 +01:00
Michael McConville	b45056be04	mem: fix mmap error check on huge page attach mmap(2) returns MAP_FAILED, not NULL, on failure. Signed-off-by: Michael McConville <mmcco@mykolab.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-01-09 16:59:50 +01:00
Kefu Chai	4a386cfead	contigmem: fix build on FreeBSD 12 include <sys/vmmeter.h> to fix build otherwise the build fails with FreeBSD 12, like In file included from contigmem.c:57: /usr/srcs/head/src/sys/vm/vm_phys.h:122:10: error: use of undeclared identifier 'vm_cnt' return (vm_cnt.v_free_count += adj); ^ Signed-off-by: Kefu Chai <tchaikov@gmail.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-09 16:52:16 +01:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Jianfeng Tan	d4a586d29e	bus/vdev: move code from EAL into a new driver Move the vdev bus from lib/librte_eal to drivers/bus. As the crypto vdev helper function refers to data structure in rte_vdev.h, so we move those helper function into drivers/bus too. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>	2017-11-07 16:54:07 +01:00
Xiaoyun Li	d35cc1fe6a	eal/x86: revert select optimized memcpy at run-time Revert the patchset run-time Linking support including the following 3 commits: Fixes: 84cc318424d4 ("eal/x86: select optimized memcpy at run-time") Fixes: c7fbc80fe60f ("test: select memcpy alignment unit at run-time") Fixes: 5f180ae32962 ("efd: move AVX2 lookup in its own compilation unit") The patchset would cause perf drop in vhost/virtio loopback performance test. Because the run-time dispatch must cost at least a function call comparing to the compile-time dispatch. And the reference cpu cycles value is small. And in the test, when using 128-256 bytes packet, it would cause 16%-20% perf drop with mergeble path. When using 256 bytes packet, it would cause 13% perf drop with vector path. Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>	2017-11-07 01:16:03 +01:00
Thomas Monjalon	b0eca11631	mempool: rename address mapping function to IOVA The function rte_mempool_virt2phy() is renamed to rte_mempool_virt2iova(). The new function has one less parameter because it is unused. The deprecated function is kept as an alias to avoid breaking the API. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2017-11-06 22:26:13 +01:00
Thomas Monjalon	62196f4e09	mem: rename address mapping function to IOVA The function rte_mem_virt2phy() is kept and used in functions which works only with physical addresses. For all other calls this function is replaced by rte_mem_virt2iova() which does a direct mapping (no conversion) in the VA case. Note: the new function rte_mem_virt2iova() function matches the behaviour implemented in rte_mem_virt2phy() by the commit 680f6c12600f ("mem: honor IOVA mode in virt2phy") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>	2017-11-06 22:24:19 +01:00
Santosh Shukla	7ba49d39f1	mem: rename segment address from physical to IOVA Renaming rte_memseg {.phys_addr} to {.iova} Keep the deprecated name in an anonymous union to avoid breaking the API. Use rte_iova_t and RTE_BAD_IOVA where appropriate in memory segment handling. Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2017-11-06 22:23:41 +01:00
Thomas Monjalon	4c00cfdc0e	remove useless memzone includes The memzone header is often included without good reason. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2017-11-06 22:12:08 +01:00

1 2 3 4 5 ...

310 Commits