numam-dpdk

Author	SHA1	Message	Date
Andy Green	54a93341cc	eal: explicit cast of builtin for bsf32 rte_common.h:416:9: warning: conversion to 'uint32_t' {aka 'unsigned int'} from 'int' may change the sign of the result [-Wsign-conversion] return __builtin_ctz(v); ^~~~~~~~~~~~~~~~ The builtin is defined to return int, but we want to return it as uint32_t. Its only defined valid return values are positive integers or zero, which is OK for uint32_t. So just add an explicit cast. Fixes: `03f6bced5b` ("eal: use intrinsic function") Cc: stable@dpdk.org Signed-off-by: Andy Green <andy@warmcat.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-05-13 22:45:05 +02:00
Bruce Richardson	bd0d13a8c4	build: ensure compatibility with future meson versions Meson 0.46 fixed a bug where "extract_all_objects" would not recursively extract objects not compiled from source for a target. To keep backward compatibility, a "recursive" keyword-arg was added to make this optional. The value is "false" by default for now, but will change to "true" in future, so we hard-code it to "false" in our code to ensure future compatibility. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-05-08 22:22:02 +02:00
Konstantin Ananyev	a93ff62a89	bpf: introduce basic Rx/Tx filters Introduce API to install BPF based filters on ethdev RX/TX path. Current implementation is pure SW one, based on ethdev RX/TX callback mechanism. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:36:34 +02:00
Konstantin Ananyev	cc752e43e0	bpf: add JIT compilation for x86_64 ISA Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:36:27 +02:00
Konstantin Ananyev	6e12ec4c4d	bpf: add more checks Add checks for: - all instructions are valid ones (known opcodes, correct syntax, valid reg/off/imm values, etc.) - no unreachable instructions - no loops - basic stack boundaries checks - division by zero Still need to add checks for: - use/return only initialized registers and stack data. - memory boundaries violation Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:23 +02:00
Konstantin Ananyev	5dba93ae5f	bpf: add ability to load eBPF program from ELF object file Introduce rte_bpf_elf_load() function to provide ability to load eBPF program from ELF object file. It also adds dependency on libelf. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:20 +02:00
Konstantin Ananyev	94972f35a0	bpf: add BPF loading and execution framework librte_bpf provides a framework to load and execute eBPF bytecode inside user-space dpdk based applications. It supports basic set of features from eBPF spec (https://www.kernel.org/doc/Documentation/networking/filter.txt). Not currently supported features: - JIT - cBPF - tail-pointer call - eBPF MAP - skb - function calls for 32-bit apps - mbuf pointer as input parameter for 32-bit apps Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:15 +02:00
Kamil Chalupnik	58a695c6ec	bbdev: split queue groups Splitting Queue Groups into UL/DL Groups in Turbo Software Driver. They are independent for Decode/Encode. Release note updated accordingly. Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	864edd6935	bbdev: measure offload cost New test created to measure offload cost. Changes were introduced in API, turbo software driver and test application Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	795ae2df4d	baseband/turbo_sw: support optional CRC overlap Support for optional CRC overlap in decode processing implemented in Turbo Software driver Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	47d5a04969	baseband/turbo_sw: scale likelihood ratio input Update Turbo Software driver for Wireless Baseband Device: - function scaling input LLR values to specific range [-16, 16] added - new test vectors to check device capabilities added - release note updated accordingly Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	6a1d032e79	baseband/turbo_sw: move macros to bbdev library Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	1466bafb9a	compressdev: get device id from name Added API to retrieve the device id provided the device name. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	5d432f3640	compressdev: add device capabilities Added structure which each PMD will fill out, providing the capabilities of each driver (containing mainly which compression services it supports). Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	75736aa393	compressdev: add device stats Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	8f1e111539	compressdev: add compression service feature flags Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	f40d300a81	compressdev: add device feature flags Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Shally Verma	0d6717c437	compressdev: support hash operations - Added hash algo enumeration and params in xform and rte_comp_op - Updated compress/decompress xform to input hash algorithm - Updated struct rte_comp_op to input hash buffer User in capability query will know about support hashes via device info comp_feature_flag. If supported, application can initialize desired algorithm enumeration in xform structure and pass valid hash buffer during enqueue_burst(). Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Sunila Sahu <sunila.sahu@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	b342c57aae	compressdev: support stateful operations Added stream data (stream) in compression operation, which will contain the private data from each PMD to support stateful operations. Also, added functions to create/free this data. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	32176b0285	compressdev: support stateless operations Added private transform data (priv_xform) in compression operation, which will contain the private data from each PMD to support stateless operations. Also, added functions to create/free this data. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	96086db5a3	compressdev: add operation management Added functions to allocate and free compression operations. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	63f4bfd532	compressdev: add enqueue/dequeue functions Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	f87bdc1ddc	compressdev: add compression specific data Added structures and enums specific to compression, including the compression operation structure and the different supported algorithms, checksums and compression levels. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	24a0fef851	compressdev: add queue pair management Add functions to manage device queue pairs. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	ed7dd94f7f	compressdev: add basic device management Add basic functions to manage compress devices, including driver and device allocation, and the basic interface with compressdev PMDs. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Nikhil Rao	c2189c907d	eventdev: make ethdev port identifiers 16-bit Ethernet port ID data size has been extended to 16 bits size 17.11 Update the Rx event adapter interface and implementation accordingly. This commit bumps the library version to refect the ABI change caused by extending the ethernet port parameter in Rx adapter functions from 8 to 16 bits. Fixes: `9c38b704d2` ("eventdev: add eth Rx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-05-10 18:04:31 +02:00
Abhinandan Gujjar	7901eac340	eventdev: add crypto adapter implementation This patch adds common code for the crypto adapter to support SW and HW based transfer mechanisms. The adapter uses an EAL service core function for SW based packet transfer and uses the eventdev PMD functions to configure HW based packet transfer between the crypto device and the event device. This patch also adds adapter to the meson build system & updates the necessary makefile & map file. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:08:46 +02:00
Abhinandan Gujjar	9dc1bd7326	eventdev: add driver interface of crypto adapter This patch defines capabilities & functions to be called for eventdev PMDs. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:07:37 +02:00
Abhinandan Gujjar	dbe869baf4	eventdev: introduce event crypto adapter This patch introduces event crypto adapter APIs. It also provides information on working model/adapter modes & their usage. Application is expected to use this interface to transfer packets between the crypto device & the event device. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:03:57 +02:00
Nikhil Rao	b2b8577da5	eventdev: convert eth Rx adapter files to SPDX license tag Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-05-10 14:03:20 +02:00
Jasvinder Singh	8ea4143883	table: add dedicated params struct for cuckoo hash Add dedicated parameter structure for cuckoo hash. The cuckoo hash from librte_hash uses slightly different prototype for the hash function (no key_mask parameter, 32-bit seed and return value) that require either of the following approaches: 1/ Function pointer conversion: gcc 8.1 warning [1], misleading [2] 2/ Union within the parameter structure: pollutes a very generic API parameter structure with some implementation dependent detail (i.e. key mask not available for one of the available implementations) 3/ Using opaque pointer for hash function: same issue from 2/ 4/ Different parameter structure: avoid issue from 2/; hopefully, it won't be long before librte_hash implements the key mask feature, so the generic API structure could be used. [1] http://www.dpdk.org/ml/archives/dev/2018-April/094950.html [2] http://www.dpdk.org/ml/archives/dev/2018-April/096250.html Fixes: `5a80bf0ae6` ("table: add cuckoo hash") Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-05-08 16:19:58 +02:00
Jasvinder Singh	4726fb245e	sched: add post-init pipe profile API Add new API function to add more pipe configuration profiles post initialization to the set of exisitng profiles specified during the creation of scheduler port. This API removes the current limitation that forces the user to define the full set of pipe profiles as the part of port parameters while port is being created. Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>	2018-05-04 16:25:48 +02:00
Nikhil Rao	2fcf2f104f	ethdev: support WRED thresholds in bytes WRED thresholds can be specified in bytes if the TM leaf node supports it. Also extend WRED thresholds to 32 bits from 16. TM capability (port/level/queue) fields cman_wred_packet_mode_supported and cman_wred_byte_mode_supported, when non-zero, indicate support for WRED thresholds in packets and bytes respectively. The packet_mode member of struct rte_tm_wred_params, when non-zero, indicates that the min and max thresholds are specified in packets and when zero, indicates that the min and max thresholds are specified in bytes. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-05-04 16:23:19 +02:00
Ben Shelton	50752f06d2	ethdev: fix TM API comment The rte_tm_node_wfq_weight_mode_update() API function operates on non-leaf nodes, not leaf nodes. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>	2018-05-01 16:50:28 +02:00
Anatoly Burakov	0256386dc4	mem: add argument to memory event callback It may be useful to pass arbitrary data to the callback (such as device pointers), so add this to the mem event callback API. Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-05-08 22:28:58 +02:00
Olivier Matz	5751ff40fe	mempool: fix alignment of memzone length when populating When populating a mempool with the default function, if there is not enough virtually contiguous memory for the whole mempool, it will be populated with several chunks. A chunk of the maximum available length is requested with: mz = rte_memzone_reserve_aligned(..., len=0, ..., align=x) If align is smaller than the page size, the address and the length of the memzone may not be a multiple of the page size. This makes rte_mempool_populate_virt() to fail because it requires them to be page-aligned. This patch fixes that. The problem can be reproduced easily by allocating more than available memory: ./build/app/testpmd -l 0,1 -- --total-num-mbufs=65536 ... Cause: Creation of mbuf pool for socket 0 failed: Invalid argument After the patch, the error code is correct: ./build/app/testpmd -l 0,1 -- --total-num-mbufs=65536 ... Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory Fixes: `ba0009560c` ("mempool: support new allocation methods") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-05-08 15:58:20 +02:00
Thomas Monjalon	7baac77594	version: 18.05-rc2 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-05-02 23:12:16 +02:00
Konstantin Ananyev	51c7de38e2	eal/x86: fix atomic exchange for 32-bit Should break out of loop when rte_atomic64_cmpset() returns non-zero. Fixes: `ff2863570f` ("eal: introduce atomic exchange operation") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-02 19:23:06 +02:00
Anatoly Burakov	0db6d2782c	malloc: avoid padding elements on page deallocation Currently, when deallocating pages, malloc will fixup other elements' headers if there is not enough space to store a full element in leftover space. This leads to race conditions because there are some functions that check for pad size with an unlocked heap, expecting pad size to be constant. Fix it by being more conservative and only freeing pages when there is enough space before and after the page to store a free element. Fixes: `1403f87d4f` ("malloc: enable memory hotplug support") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-05-02 18:35:19 +02:00
Anatoly Burakov	dc14d4f026	malloc: set pad to 0 on free The pad value is not used unless element is in pad state, but it will show up in heap dumps and may be confusing. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-05-02 18:35:19 +02:00
Jianfeng Tan	3a0d465d4c	eal: fix use-after-free on control thread creation After below commit, we encounter some strange issue: 1) Dead lock as described here: http://dpdk.org/ml/archives/dev/2018-April/099806.html 2) SIGSEGV issue when starting a testpmd in VM. Considering below commit changes to use dynamic memory instead of stack for memory barrier, we doubt it's caused by use-after-free. Fixes: `3d09a6e26d` ("eal: fix threads block on barrier") Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reported-by: Lei Yao <lei.a.yao@intel.com> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2018-05-02 17:23:37 +02:00
Jianfeng Tan	e87923a9be	eal: fix memory leak on control thread failure params is not freed if pthread_create() fails. The fix is straight-forward. Fixes: `3d09a6e26d` ("eal: fix threads block on barrier") Reported-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2018-05-02 17:15:02 +02:00
Ferruh Yigit	af7551e2bf	ethdev: remove error return on RSS hash check Many sample applications fail because of dev_info.flow_type_rss_offloads check in rte_eth_dev_configure() The sample applications need to be fixed/updated before returning error on rte_eth_dev_configure() and rte_eth_dev_rss_hash_update(). This patch keeps the error logs but removes returning errors. Fixes: `8863a1fbfc` ("ethdev: add supported hash function check") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-05-01 17:55:15 +02:00
Anatoly Burakov	3eb9af3416	malloc: fix heap size not set on init When heap initializes, we need to add already allocated segments onto the heap. However, in doing that, we never increased total heap size. Fix it by adding segment length to total heap length when initializing the heap. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-30 15:33:49 +02:00
Anatoly Burakov	eb8d29f825	mem/linux: fix hugedir write deadlock At hugepage info initialization, EAL takes out a write lock on hugetlbfs directories, and drops it after the memory init is finished. However, in non-legacy mode, if "-m" or "--socket-mem" switches are passed, this leads to a deadlock because EAL tries to allocate pages (and thus take out a write lock on hugedir) while still holding a separate hugedir write lock in EAL. Fix it by checking if write lock in hugepage info is active, and not trying to lock the directory if the hugedir fd is valid. Fixes: `1a7dc2252f` ("mem: revert to using flock and add per-segment lockfiles") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Shahaf Shuler <shahafs@mellanox.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-30 15:23:17 +02:00
Thomas Monjalon	fcde84b5f8	version: 18.05-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-28 00:26:04 +02:00
Anatoly Burakov	1a7dc2252f	mem: revert to using flock and add per-segment lockfiles The original implementation used flock() locks, but was later switched to using fcntl() locks for page locking, because fcntl() locks allow locking parts of a file, which is useful for single-file segments mode, where locking the entire file isn't as useful because we still need to grow and shrink it. However, according to fcntl()'s Ubuntu manpage [1], semantics of fcntl() locks have a giant oversight: This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. Basically, closing any fd with an fcntl() lock (which we do because we don't want to leak fd's) will drop the lock completely. So, in this commit, we will be reverting back to using flock() locks everywhere. However, that still leaves the problem of locking parts of a memseg list file in single file segments mode, and we will be solving it with creating separate lock files per each page, and tracking those with flock(). We will also be removing all of this tailq business and replacing it with a simple array - saving a few bytes is not worth the extra hassle of dealing with pointers and potential memory allocation failures. Also, remove the tailq lock since it is not needed - these fd lists are per-process, and within a given process, it is always only one thread handling access to hugetlbfs. So, first one to allocate a segment will create a lockfile, and put a shared lock on it. When we're shrinking the page file, we will be trying to take out a write lock on that lockfile, which would fail if any other process is holding onto the lockfile as well. This way, we can know if we can shrink the segment file. Also, if no other locks are found in the lock list for a given memseg list, the memseg list fd is automatically closed. One other thing to note is, according to flock() Ubuntu manpage [2], upgrading the lock from shared to exclusive is implemented by dropping and reacquiring the lock, which is not atomic and thus would have created race conditions. So, on attempting to perform operations in hugetlbfs, we will take out a writelock on hugetlbfs directory, so that only one process could perform hugetlbfs operations concurrently. [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Fixes: `a5ff05d60f` ("mem: support unmapping pages at runtime") Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	046aa5c447	mem: add memalloc init stage Currently, memseg lists for secondary process are allocated on sync (triggered by init), when they are accessed for the first time. Move this initialization to a separate init stage for memalloc. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	1be7644986	mem: improve autodetection of hugepage counts on 32-bit For non-legacy mode, we are preallocating space for hugepages, so we know in advance which pages we will be able to allocate, and which we won't. However, the init procedure was using hugepage counts gathered from sysfs and paid no attention to hugepage sizes that were actually available for reservation, and failed on attempts to reserve unavailable pages. Fix this by limiting total page counts by number of pages actually preallocated. Also, VA preallocate procedure only looks at mountpoints that are available, and expects pages to exist if a mountpoint exists. That might not necessarily be the case, so also check if there are hugepages available for a particular page size on a particular NUMA node. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	e82ca1a75e	mem: improve preallocation on 32-bit Previously, if we couldn't preallocate VA space on 32-bit for one page size, we simply bailed out, even though we could've tried allocating VA space with other page sizes. For example, if user had both 1G and 2M pages enabled, and has asked DPDK to allocate memory on both sockets, DPDK would've tried to allocate VA space for 1x1G page on both sockets, failed and never tried again, even though it could've allocated the same 1G of VA space for 512x2M pages. Fix this by retrying with different page sizes if VA space reservation failed. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00

1 2 3 4 5 ...

4329 Commits