numam-dpdk

Author	SHA1	Message	Date
Bruce Richardson	bd0d13a8c4	build: ensure compatibility with future meson versions Meson 0.46 fixed a bug where "extract_all_objects" would not recursively extract objects not compiled from source for a target. To keep backward compatibility, a "recursive" keyword-arg was added to make this optional. The value is "false" by default for now, but will change to "true" in future, so we hard-code it to "false" in our code to ensure future compatibility. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-05-08 22:22:02 +02:00
Konstantin Ananyev	a93ff62a89	bpf: introduce basic Rx/Tx filters Introduce API to install BPF based filters on ethdev RX/TX path. Current implementation is pure SW one, based on ethdev RX/TX callback mechanism. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:36:34 +02:00
Konstantin Ananyev	cc752e43e0	bpf: add JIT compilation for x86_64 ISA Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:36:27 +02:00
Konstantin Ananyev	6e12ec4c4d	bpf: add more checks Add checks for: - all instructions are valid ones (known opcodes, correct syntax, valid reg/off/imm values, etc.) - no unreachable instructions - no loops - basic stack boundaries checks - division by zero Still need to add checks for: - use/return only initialized registers and stack data. - memory boundaries violation Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:23 +02:00
Konstantin Ananyev	5dba93ae5f	bpf: add ability to load eBPF program from ELF object file Introduce rte_bpf_elf_load() function to provide ability to load eBPF program from ELF object file. It also adds dependency on libelf. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:20 +02:00
Konstantin Ananyev	94972f35a0	bpf: add BPF loading and execution framework librte_bpf provides a framework to load and execute eBPF bytecode inside user-space dpdk based applications. It supports basic set of features from eBPF spec (https://www.kernel.org/doc/Documentation/networking/filter.txt). Not currently supported features: - JIT - cBPF - tail-pointer call - eBPF MAP - skb - function calls for 32-bit apps - mbuf pointer as input parameter for 32-bit apps Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-12 00:35:15 +02:00
Kamil Chalupnik	58a695c6ec	bbdev: split queue groups Splitting Queue Groups into UL/DL Groups in Turbo Software Driver. They are independent for Decode/Encode. Release note updated accordingly. Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	864edd6935	bbdev: measure offload cost New test created to measure offload cost. Changes were introduced in API, turbo software driver and test application Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	795ae2df4d	baseband/turbo_sw: support optional CRC overlap Support for optional CRC overlap in decode processing implemented in Turbo Software driver Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	47d5a04969	baseband/turbo_sw: scale likelihood ratio input Update Turbo Software driver for Wireless Baseband Device: - function scaling input LLR values to specific range [-16, 16] added - new test vectors to check device capabilities added - release note updated accordingly Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Kamil Chalupnik	6a1d032e79	baseband/turbo_sw: move macros to bbdev library Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	1466bafb9a	compressdev: get device id from name Added API to retrieve the device id provided the device name. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	5d432f3640	compressdev: add device capabilities Added structure which each PMD will fill out, providing the capabilities of each driver (containing mainly which compression services it supports). Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	75736aa393	compressdev: add device stats Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	8f1e111539	compressdev: add compression service feature flags Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Fiona Trahe	f40d300a81	compressdev: add device feature flags Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:20 +01:00
Shally Verma	0d6717c437	compressdev: support hash operations - Added hash algo enumeration and params in xform and rte_comp_op - Updated compress/decompress xform to input hash algorithm - Updated struct rte_comp_op to input hash buffer User in capability query will know about support hashes via device info comp_feature_flag. If supported, application can initialize desired algorithm enumeration in xform structure and pass valid hash buffer during enqueue_burst(). Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Sunila Sahu <sunila.sahu@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	b342c57aae	compressdev: support stateful operations Added stream data (stream) in compression operation, which will contain the private data from each PMD to support stateful operations. Also, added functions to create/free this data. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	32176b0285	compressdev: support stateless operations Added private transform data (priv_xform) in compression operation, which will contain the private data from each PMD to support stateless operations. Also, added functions to create/free this data. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	96086db5a3	compressdev: add operation management Added functions to allocate and free compression operations. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	63f4bfd532	compressdev: add enqueue/dequeue functions Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	f87bdc1ddc	compressdev: add compression specific data Added structures and enums specific to compression, including the compression operation structure and the different supported algorithms, checksums and compression levels. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	24a0fef851	compressdev: add queue pair management Add functions to manage device queue pairs. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Fiona Trahe	ed7dd94f7f	compressdev: add basic device management Add basic functions to manage compress devices, including driver and device allocation, and the basic interface with compressdev PMDs. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Signed-off-by: Shally Verma <shally.verma@caviumnetworks.com> Signed-off-by: Ashish Gupta <ashish.gupta@caviumnetworks.com>	2018-05-10 17:46:19 +01:00
Nikhil Rao	c2189c907d	eventdev: make ethdev port identifiers 16-bit Ethernet port ID data size has been extended to 16 bits size 17.11 Update the Rx event adapter interface and implementation accordingly. This commit bumps the library version to refect the ABI change caused by extending the ethernet port parameter in Rx adapter functions from 8 to 16 bits. Fixes: `9c38b704d2` ("eventdev: add eth Rx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-05-10 18:04:31 +02:00
Abhinandan Gujjar	7901eac340	eventdev: add crypto adapter implementation This patch adds common code for the crypto adapter to support SW and HW based transfer mechanisms. The adapter uses an EAL service core function for SW based packet transfer and uses the eventdev PMD functions to configure HW based packet transfer between the crypto device and the event device. This patch also adds adapter to the meson build system & updates the necessary makefile & map file. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:08:46 +02:00
Abhinandan Gujjar	9dc1bd7326	eventdev: add driver interface of crypto adapter This patch defines capabilities & functions to be called for eventdev PMDs. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:07:37 +02:00
Abhinandan Gujjar	dbe869baf4	eventdev: introduce event crypto adapter This patch introduces event crypto adapter APIs. It also provides information on working model/adapter modes & their usage. Application is expected to use this interface to transfer packets between the crypto device & the event device. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-05-10 14:03:57 +02:00
Nikhil Rao	b2b8577da5	eventdev: convert eth Rx adapter files to SPDX license tag Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-05-10 14:03:20 +02:00
Jasvinder Singh	8ea4143883	table: add dedicated params struct for cuckoo hash Add dedicated parameter structure for cuckoo hash. The cuckoo hash from librte_hash uses slightly different prototype for the hash function (no key_mask parameter, 32-bit seed and return value) that require either of the following approaches: 1/ Function pointer conversion: gcc 8.1 warning [1], misleading [2] 2/ Union within the parameter structure: pollutes a very generic API parameter structure with some implementation dependent detail (i.e. key mask not available for one of the available implementations) 3/ Using opaque pointer for hash function: same issue from 2/ 4/ Different parameter structure: avoid issue from 2/; hopefully, it won't be long before librte_hash implements the key mask feature, so the generic API structure could be used. [1] http://www.dpdk.org/ml/archives/dev/2018-April/094950.html [2] http://www.dpdk.org/ml/archives/dev/2018-April/096250.html Fixes: `5a80bf0ae6` ("table: add cuckoo hash") Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-05-08 16:19:58 +02:00
Jasvinder Singh	4726fb245e	sched: add post-init pipe profile API Add new API function to add more pipe configuration profiles post initialization to the set of exisitng profiles specified during the creation of scheduler port. This API removes the current limitation that forces the user to define the full set of pipe profiles as the part of port parameters while port is being created. Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>	2018-05-04 16:25:48 +02:00
Nikhil Rao	2fcf2f104f	ethdev: support WRED thresholds in bytes WRED thresholds can be specified in bytes if the TM leaf node supports it. Also extend WRED thresholds to 32 bits from 16. TM capability (port/level/queue) fields cman_wred_packet_mode_supported and cman_wred_byte_mode_supported, when non-zero, indicate support for WRED thresholds in packets and bytes respectively. The packet_mode member of struct rte_tm_wred_params, when non-zero, indicates that the min and max thresholds are specified in packets and when zero, indicates that the min and max thresholds are specified in bytes. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-05-04 16:23:19 +02:00
Ben Shelton	50752f06d2	ethdev: fix TM API comment The rte_tm_node_wfq_weight_mode_update() API function operates on non-leaf nodes, not leaf nodes. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>	2018-05-01 16:50:28 +02:00
Anatoly Burakov	0256386dc4	mem: add argument to memory event callback It may be useful to pass arbitrary data to the callback (such as device pointers), so add this to the mem event callback API. Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-05-08 22:28:58 +02:00
Olivier Matz	5751ff40fe	mempool: fix alignment of memzone length when populating When populating a mempool with the default function, if there is not enough virtually contiguous memory for the whole mempool, it will be populated with several chunks. A chunk of the maximum available length is requested with: mz = rte_memzone_reserve_aligned(..., len=0, ..., align=x) If align is smaller than the page size, the address and the length of the memzone may not be a multiple of the page size. This makes rte_mempool_populate_virt() to fail because it requires them to be page-aligned. This patch fixes that. The problem can be reproduced easily by allocating more than available memory: ./build/app/testpmd -l 0,1 -- --total-num-mbufs=65536 ... Cause: Creation of mbuf pool for socket 0 failed: Invalid argument After the patch, the error code is correct: ./build/app/testpmd -l 0,1 -- --total-num-mbufs=65536 ... Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory Fixes: `ba0009560c` ("mempool: support new allocation methods") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-05-08 15:58:20 +02:00
Thomas Monjalon	7baac77594	version: 18.05-rc2 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-05-02 23:12:16 +02:00
Konstantin Ananyev	51c7de38e2	eal/x86: fix atomic exchange for 32-bit Should break out of loop when rte_atomic64_cmpset() returns non-zero. Fixes: `ff2863570f` ("eal: introduce atomic exchange operation") Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-05-02 19:23:06 +02:00
Anatoly Burakov	0db6d2782c	malloc: avoid padding elements on page deallocation Currently, when deallocating pages, malloc will fixup other elements' headers if there is not enough space to store a full element in leftover space. This leads to race conditions because there are some functions that check for pad size with an unlocked heap, expecting pad size to be constant. Fix it by being more conservative and only freeing pages when there is enough space before and after the page to store a free element. Fixes: `1403f87d4f` ("malloc: enable memory hotplug support") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-05-02 18:35:19 +02:00
Anatoly Burakov	dc14d4f026	malloc: set pad to 0 on free The pad value is not used unless element is in pad state, but it will show up in heap dumps and may be confusing. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-05-02 18:35:19 +02:00
Jianfeng Tan	3a0d465d4c	eal: fix use-after-free on control thread creation After below commit, we encounter some strange issue: 1) Dead lock as described here: http://dpdk.org/ml/archives/dev/2018-April/099806.html 2) SIGSEGV issue when starting a testpmd in VM. Considering below commit changes to use dynamic memory instead of stack for memory barrier, we doubt it's caused by use-after-free. Fixes: `3d09a6e26d` ("eal: fix threads block on barrier") Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reported-by: Lei Yao <lei.a.yao@intel.com> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2018-05-02 17:23:37 +02:00
Jianfeng Tan	e87923a9be	eal: fix memory leak on control thread failure params is not freed if pthread_create() fails. The fix is straight-forward. Fixes: `3d09a6e26d` ("eal: fix threads block on barrier") Reported-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2018-05-02 17:15:02 +02:00
Ferruh Yigit	af7551e2bf	ethdev: remove error return on RSS hash check Many sample applications fail because of dev_info.flow_type_rss_offloads check in rte_eth_dev_configure() The sample applications need to be fixed/updated before returning error on rte_eth_dev_configure() and rte_eth_dev_rss_hash_update(). This patch keeps the error logs but removes returning errors. Fixes: `8863a1fbfc` ("ethdev: add supported hash function check") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-05-01 17:55:15 +02:00
Anatoly Burakov	3eb9af3416	malloc: fix heap size not set on init When heap initializes, we need to add already allocated segments onto the heap. However, in doing that, we never increased total heap size. Fix it by adding segment length to total heap length when initializing the heap. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-30 15:33:49 +02:00
Anatoly Burakov	eb8d29f825	mem/linux: fix hugedir write deadlock At hugepage info initialization, EAL takes out a write lock on hugetlbfs directories, and drops it after the memory init is finished. However, in non-legacy mode, if "-m" or "--socket-mem" switches are passed, this leads to a deadlock because EAL tries to allocate pages (and thus take out a write lock on hugedir) while still holding a separate hugedir write lock in EAL. Fix it by checking if write lock in hugepage info is active, and not trying to lock the directory if the hugedir fd is valid. Fixes: `1a7dc2252f` ("mem: revert to using flock and add per-segment lockfiles") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Shahaf Shuler <shahafs@mellanox.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-30 15:23:17 +02:00
Thomas Monjalon	fcde84b5f8	version: 18.05-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-28 00:26:04 +02:00
Anatoly Burakov	1a7dc2252f	mem: revert to using flock and add per-segment lockfiles The original implementation used flock() locks, but was later switched to using fcntl() locks for page locking, because fcntl() locks allow locking parts of a file, which is useful for single-file segments mode, where locking the entire file isn't as useful because we still need to grow and shrink it. However, according to fcntl()'s Ubuntu manpage [1], semantics of fcntl() locks have a giant oversight: This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. Basically, closing any fd with an fcntl() lock (which we do because we don't want to leak fd's) will drop the lock completely. So, in this commit, we will be reverting back to using flock() locks everywhere. However, that still leaves the problem of locking parts of a memseg list file in single file segments mode, and we will be solving it with creating separate lock files per each page, and tracking those with flock(). We will also be removing all of this tailq business and replacing it with a simple array - saving a few bytes is not worth the extra hassle of dealing with pointers and potential memory allocation failures. Also, remove the tailq lock since it is not needed - these fd lists are per-process, and within a given process, it is always only one thread handling access to hugetlbfs. So, first one to allocate a segment will create a lockfile, and put a shared lock on it. When we're shrinking the page file, we will be trying to take out a write lock on that lockfile, which would fail if any other process is holding onto the lockfile as well. This way, we can know if we can shrink the segment file. Also, if no other locks are found in the lock list for a given memseg list, the memseg list fd is automatically closed. One other thing to note is, according to flock() Ubuntu manpage [2], upgrading the lock from shared to exclusive is implemented by dropping and reacquiring the lock, which is not atomic and thus would have created race conditions. So, on attempting to perform operations in hugetlbfs, we will take out a writelock on hugetlbfs directory, so that only one process could perform hugetlbfs operations concurrently. [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Fixes: `a5ff05d60f` ("mem: support unmapping pages at runtime") Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	046aa5c447	mem: add memalloc init stage Currently, memseg lists for secondary process are allocated on sync (triggered by init), when they are accessed for the first time. Move this initialization to a separate init stage for memalloc. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	1be7644986	mem: improve autodetection of hugepage counts on 32-bit For non-legacy mode, we are preallocating space for hugepages, so we know in advance which pages we will be able to allocate, and which we won't. However, the init procedure was using hugepage counts gathered from sysfs and paid no attention to hugepage sizes that were actually available for reservation, and failed on attempts to reserve unavailable pages. Fix this by limiting total page counts by number of pages actually preallocated. Also, VA preallocate procedure only looks at mountpoints that are available, and expects pages to exist if a mountpoint exists. That might not necessarily be the case, so also check if there are hugepages available for a particular page size on a particular NUMA node. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	e82ca1a75e	mem: improve preallocation on 32-bit Previously, if we couldn't preallocate VA space on 32-bit for one page size, we simply bailed out, even though we could've tried allocating VA space with other page sizes. For example, if user had both 1G and 2M pages enabled, and has asked DPDK to allocate memory on both sockets, DPDK would've tried to allocate VA space for 1x1G page on both sockets, failed and never tried again, even though it could've allocated the same 1G of VA space for 512x2M pages. Fix this by retrying with different page sizes if VA space reservation failed. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	a99e8df63f	mem: fix 32-bit memory upper limit for non-legacy mode 32-bit mode has an upper limit on amount of VA space it can preallocate, but the original implementation used the wrong constant, resulting in failure to initialize due to integer overflow. Fix it by using the correct constant. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	64b6fcb161	malloc: check for heap corruption Previous code checked for both first/last elements being NULL, but if they weren't, the expectation was that they're both non-NULL, which will be the case under normal conditions, but may not be the case due to heap structure corruption. Coverity issue: 272566 Fixes: `bb372060da` ("malloc: make heap a doubly-linked list") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	0af8db3172	malloc: fix out-of-bounds segment array access Technically, while the pointer would've been invalid if msl_idx were invalid, we wouldn't have actually attempted to access the pointer until verifying the index. Fix it by moving array access to after we've verified validity of the index. Coverity issue: 272574 Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	627e80e4f6	malloc: replace snprintf with strlcpy Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	8f91f368a1	mem: log page address before unmapping If user has specified a flag to unmap the area right after mapping it, we were passing an already-unmapped pointer to RTE_LOG. This is not an issue since RTE_LOG doesn't actually dereference the pointer, but fix it anyway by moving call to RTE_LOG to before unmap. Coverity issue: 272584 Fixes: `b7cc54187e` ("mem: move virtual area function in common directory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	0f1631be24	mem: fix page fault trigger Coverity reports these lines as having no effect. Technically, we do want for those lines to have no effect, however they would've likely been optimized out. Add volatile qualifiers to ensure the code has effects. Coverity issue: 272608 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	e27ffec169	mem: fix potential bad unmap on map failure Previously, if mmap failed to map page address at requested address, we were attempting to unmap the wrong address. Fix it by unmapping our actual mapped address, and jump further to avoid unmapping memory that is not allocated. Coverity issue: 272602 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	8ee25c7e81	mem: fix comparison of old policy Previous code had an old rebase leftover from the time when oldpolicy was an actual int, instead of a pointer. Fix it to do comparison with dereferencing the pointer. Coverity issue: 272589 Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	8dfb09dee4	mem: fix potential resource leak on alloc Normally, tailq entry should have a valid fd by the time we attempt to map the segment. However, in case it doesn't, we're leaking fd, so fix it. Coverity issue: 272570 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	b48f859a03	mem: fix potential resource leak on freeing We close fd if we managed to find it in the list of allocated segment lists (which should always be the case under normal conditions), but if we didn't, the fd was leaking. Close it if we couldn't find it in the segment list. This is not an issue as if the segment is zero length, we're getting rid of it anyway, so there's no harm in not storing the fd anywhere. Coverity issue: 272568 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	6f0fa9f238	mem: fix potential double close on map failure We were closing descriptor before checking if mapping has failed, but if it did, we did a second close afterwards. Fix it by moving closing descriptor to after we've done all error checks. Coverity issue: 272560 Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	5441fcfd87	mem: fix resource leak on map failure Coverity issue: 272601 Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Anatoly Burakov	42c2a6a819	mem: use strlcpy instead of snprintf Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:42:40 +02:00
Jianfeng Tan	5e6df556f6	mem: fix resize return handling for --single-file-segments resize_hugefile() returns either 0 (which indicates success) or -1 (which indicates failure). We failed to check the success as we use --single-file-segments option. Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 23:42:40 +02:00
Jianfeng Tan	3d09a6e26d	eal: fix threads block on barrier Below commit introduced pthread barrier for synchronization. But two IPC threads block on the barrier, and never wake up. (gdb) bt #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184 #3 rte_thread_init (arg=0x7fffffffcfe0) at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Through analysis, we find the barrier defined on the stack could be the root cause. This patch will change to use heap memory as the barrier. Fixes: `d651ee4919` ("eal: set affinity for control threads") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-04-27 21:47:43 +02:00
Yongseok Koh	a53aa2b9f3	mbuf: support attaching external buffer This patch introduces a new way of attaching an external buffer to a mbuf. Attaching an external buffer is quite similar to mbuf indirection in replacing buffer addresses and length of a mbuf, but a few differences: - When an indirect mbuf is attached, refcnt of the direct mbuf would be 2 as long as the direct mbuf itself isn't freed after the attachment. In such cases, the buffer area of a direct mbuf must be read-only. But external buffer has its own refcnt and it starts from 1. Unless multiple mbufs are attached to a mbuf having an external buffer, the external buffer is writable. - There's no need to allocate buffer from a mempool. Any buffer can be attached with appropriate free callback. - Smaller metadata is required to maintain shared data such as refcnt. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 20:11:25 +02:00
Fan Zhang	613e827fb2	vhost/crypto: fix checks while moving descriptors This patch fix final condition check while moving virtqueue descriptors. Fixes: `3bb595ecd6` ("vhost/crypto: add request handler") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 19:49:20 +02:00
Fan Zhang	d4cc4c65df	vhost/crypto: fix missing head correction This patch fixes the missing head descriptor correction for indirect descriptors. Fixes: `0aee242841` ("vhost/crypto: move to safe GPA translation API") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 19:49:07 +02:00
Xiao Wang	dfdf4b84b8	vhost: fix vDPA set features We should call set_features callback after setting features in virtio_net structure, otherwise vDPA driver cannot get the right features. Fixes: `07718b4f87` ("vhost: adapt library for selective datapath") Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 18:01:00 +01:00
Maxime Coquelin	bb77d555d4	vhost: revert avoid concurrency when logging dirty pages This reverts commit `394313fff3`. While the patch did solve concurrency issue, it induces more pages copies as some clean pages are marked as dirty for performance reasons. Moreover, as there is no more contention doing the logging, the rate of packets than can be processed is higher, leading to even more pages to be dirtied. It has been reported that with more than one queue pair, and with a relatively low packet rate (1Mpps), the live migration never converges until the flow is stopped. While a better solution is found, it is better to reset to the old behaviour, i.e. using atomic operation for dirty pages logging. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 18:01:00 +01:00
Ferruh Yigit	01eb53eefe	ethdev: rename folder to library name Library folder name and output library name are same except a few flaws including librte_ether. This library is network device abstraction layer, the name "ethdev" fits better than "ether", and library & header files already named as ethdev. Also there is a rte_ether.h in the net library which can cause confusion. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-27 18:01:00 +01:00
Declan Doherty	fb8fd96d42	ethdev: add shared counter to flow API Add rte_flow_action_count action data structure to enable shared counters across multiple flows on a single port or across multiple flows on multiple ports within the same switch domain. Also this enables multiple count actions to be specified in a single flow action. This patch also modifies the existing rte_flow_query API to take the rte_flow_action structure as an input parameter instead of the rte_flow_action_type enumeration to allow querying a specific action from a flow rule when multiple actions of the same type are specified. This patch also contains updates for the bonding, failsafe and mlx5 PMDs and testpmd application which are affected by this API change. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	e05419b3f0	ethdev: add mark flow item Introduces a new action type RTE_FLOW_ITEM_TYPE_MARK which enables flow patterns to specify arbitrary integer values to match aginst set by the RTE_FLOW_ACTION_TYPE_MARK action in previously matched flows. Add support for specification of new MARK flow item in testpmd's cli. Update testpmd documentation to describe new MARK flow item support. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	2f82d143fb	ethdev: add group jump action Add jump action type which defines an action which allows a matched flow to be redirect to the specified group. This allows physical and logical flow table/group hierarchies to be defined through rte_flow. This breaks ABI compatibility for the following public functions (as it modifes the ordering of the rte_flow_action_type enumeration): - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Add support for specification of new JUMP action to testpmd's flow cli, and update the testpmd documentation to describe this new action. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	3850cf0c8c	ethdev: add tunnel encap/decap actions Add new flow action types and associated action data structures to support the encapsulation and decapsulation of VXLAN and NVGRE tunnel endpoints. The RTE_FLOW_ACTION_TYPE_[VXLAN/NVGRE]_ENCAP action will cause the matching flow to be encapsulated in the tunnel endpoint overlay defined in the [vxlan/nvgre]_encap action data. The RTE_FLOW_ACTION_TYPE_[VXLAN/NVGRE]_DECAP action will cause all headers associated with the outer most tunnel endpoint of the specified type for the matching flows. Signed-off-by: Declan Doherty <declan.doherty@intel.com>	2018-04-27 18:00:57 +01:00
Declan Doherty	ce92504063	ethdev: add switch domain allocator Add switch domain allocate and free API to enable NET devices to synchronise switch domain allocation. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Remy Horton	33af337773	ethdev: add common devargs parser Introduces a new structure, rte_eth_devargs, to support generic ethdev arguments common across NET PMDs, with a new API rte_eth_devargs_parse API to support PMD parsing these arguments. The patch add support for a representor argument passed with passed with the EAL -w option. The representor parameter allows the user to specify which representor ports to initialise on a device. The argument supports passing a single representor port, a list of port values or a range of port values. -w BDF,representor=1 # create representor port 1 on pci device BDF -w BDF,representor=[1,2,5,6,10] # create representor ports in list -w BDF,representor=[0-31] # create representor ports in range Signed-off-by: Remy Horton <remy.horton@intel.com> Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	736b30ebf2	ethdev: add port representor device flag Add new device flag to specify that an ethdev port is a port representor. Extend rte_eth_dev_info structure to expose device flags to the user which enables applications to discover if a port is a representor port. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	e489007a41	ethdev: add generic create/destroy ethdev APIs Add new bus generic ethdev create/destroy APIs which are bus independent and provide hooks for bus specific initialisation. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 18:00:56 +01:00
Declan Doherty	0804dfc209	ethdev: add switch identifier parameter to port Introduces a new port attribute to ethdev port's which denotes the switch domain a port belongs to. By default all port's switch identifiers are set to RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID. Ports which supported the concept of switch domains can be configured with the same switch domain id. Signed-off-by: Declan Doherty <declan.doherty@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 18:00:56 +01:00
Qi Zhang	7e3389b172	ethdev: add VLAN and MPLS actions to flow API Add support for the following OpenFlow-defined actions: - RTE_FLOW_ACTION_OF_POP_VLAN: pop the outer VLAN tag. - RTE_FLOW_ACTION_OF_PUSH_VLAN: push a new VLAN tag. - RTE_FLOW_ACTION_OF_SET_VLAN_VID: set the 802.1q VLAN id. - RTE_FLOW_ACTION_OF_SET_VLAN_PCP: set the 802.1q priority. - RTE_FLOW_ACTION_OF_POP_MPLS: pop the outer MPLS tag. - RTE_FLOW_ACTION_OF_PUSH_MPLS: push a new MPLS tag. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Qi Zhang	1c54c93809	ethdev: add TTL change actions to flow API Add support for the following OpenFlow-defined actions: - RTE_FLOW_ACTION_OF_SET_MPLS_TTL: MPLS TTL. - RTE_FLOW_ACTION_OF_DEC_MPLS_TTL: decrement MPLS TTL. - RTE_FLOW_ACTION_OF_SET_NW_TTL: IP TTL. - RTE_FLOW_ACTION_OF_DEC_NW_TTL: decrement IP TTL. - RTE_FLOW_ACTION_OF_COPY_TTL_OUT: copy TTL "outwards". - RTE_FLOW_ACTION_OF_COPY_TTL_IN: copy TTL "inwards". Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Qi Zhang	a903c049be	ethdev: add neighbor discovery to flow API - RTE_FLOW_ITEM_TYPE_ARP_ETH_IPV4: matches an ARP header for Ethernet/IPv4. - RTE_FLOW_ITEM_TYPE_IPV6_EXT: matches the presence of any IPv6 extension header. - RTE_FLOW_ITEM_TYPE_ICMP6: matches any ICMPv6 header. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_NS: matches an ICMPv6 neighbor discovery solicitation. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_NA: matches an ICMPv6 neighbor discovery advertisement. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT: matches the presence of any ICMPv6 neighbor discovery option. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_ETH_SLA: matches an ICMPv6 neighbor discovery source Ethernet link-layer address option. - RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_ETH_TLA: matches an ICMPv6 neighbor discovery target Ethernet link-layer address option. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:55 +01:00
Xueming Li	063f39f26d	ethdev: introduce tunnel type MPLS-in-GRE and MPLS-in-UDP This patch adds new tunnel type for MPLS-in-GRE and MPLS-in-UDP. MPLS-in-GRE protocol link: https://tools.ietf.org/html/rfc4023 MPLS-in-UDP protocol link: https://tools.ietf.org/html/rfc7510 Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 18:00:55 +01:00
Xueming Li	6f99e5b54e	ethdev: introduce new tunnel VXLAN-GPE VXLAN-GPE enables VXLAN for all protocols. Protocol link: https://www.ietf.org/id/draft-ietf-nvo3-vxlan-gpe-05.txt Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-27 18:00:55 +01:00
Adrien Mazarguil	fc6bbb3f28	ethdev: add port ID item and action to flow API RTE_FLOW_ACTION_TYPE_PORT_ID brings the ability to inject matching traffic into a different device, as identified by its DPDK port ID. This is normally only supported when the target port ID has some kind of relationship with the port ID the flow rule is created against, such as being exposed by a common physical device (e.g. a different port of an Ethernet switch). The converse pattern item, RTE_FLOW_ITEM_TYPE_PORT_ID, makes the resulting flow rule match traffic whose origin is the specified port ID. Note that specifying a port ID that differs from the one the flow rule is created against is normally meaningless (if even accepted), but can make sense if combined with the transfer attribute. These must not be confused with their PHY_PORT counterparts, which refer to physical ports using device-specific indices, but unlike PORT_ID are not necessarily tied to DPDK port IDs. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	e7b657058f	ethdev: add physical port action to flow API This patch adds the missing action counterpart to the PHY_PORT pattern item, that is, the ability to directly inject matching traffic into a physical port of the underlying device. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	fee1fa0285	ethdev: rename physical port item in flow API While RTE_FLOW_ITEM_TYPE_PORT refers to physical ports of the underlying device using specific identifiers, these are often confused with DPDK port IDs exposed to applications in the global name space. Since this pattern item is seldom used, rename it RTE_FLOW_ITEM_PHY_PORT for better clarity. No ABI impact. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	39b8dda700	ethdev: fix behavior of VF/PF in flow API Contrary to all other pattern items, these are inconsistently documented as affecting traffic instead of simply matching its origin, without provision for the latter. This commit clarifies documentation and updates PMDs since the original behavior now has to be explicitly requested using the new transfer attribute. It breaks ABI compatibility for the following public functions: - rte_flow_create() - rte_flow_validate() Impacted PMDs are bnxt and i40e, for which the VF pattern item is now only supported when a transfer attribute is also present. Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	76e9a55b5b	ethdev: add transfer attribute to flow API This new attribute enables applications to create flow rules that do not simply match traffic whose origin is specified in the pattern (e.g. some non-default physical port or VF), but actively affect it by applying the flow rule at the lowest possible level in the underlying device. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	0730ab674c	ethdev: fix default VLAN TCI mask in flow API VLAN TCI is a 16-bit field broken down as PCP (3b), DEI (1b) and VID (12b). The default mask used by PMDs for the VLAN pattern when one isn't provided by the application comprises the entire TCI, which is problematic because most devices only support VID matching. This forces applications to always provide a mask limited to the VID part in order to successfully apply a flow rule with a VLAN pattern item. Moreover, applications rarely want to match PCP and DEI intentionally. Given the above and since VID is what is commonly referred to when talking about VLAN, this commit excludes PCP and DEI from the default mask. Fixes: `6de5c0f130` ("ethdev: define default item masks in flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	e58638c324	ethdev: fix TPID handling in flow API TPID handling in rte_flow VLAN and E_TAG pattern item definitions is not consistent with the normal stacking order of pattern items, which is confusing to applications. Problem is that when followed by one of these layers, the EtherType field of the preceding layer keeps its "inner" definition, and the "outer" TPID is provided by the subsequent layer, the reverse of how a packet looks like on the wire: Wire: [ ETH TPID = A \| VLAN EtherType = B \| B DATA ] rte_flow: [ ETH EtherType = B \| VLAN TPID = A \| B DATA ] Worse, when QinQ is involved, the stacking order of VLAN layers is unspecified. It is unclear whether it should be reversed (innermost to outermost) as well given TPID applies to the previous layer: Wire: [ ETH TPID = A \| VLAN TPID = B \| VLAN EtherType = C \| C DATA ] rte_flow 1: [ ETH EtherType = C \| VLAN TPID = B \| VLAN TPID = A \| C DATA ] rte_flow 2: [ ETH EtherType = C \| VLAN TPID = A \| VLAN TPID = B \| C DATA ] While specifying EtherType/TPID is hopefully rarely necessary, the stacking order in case of QinQ and the lack of documentation remain an issue. This patch replaces TPID in the VLAN pattern item with an inner EtherType/TPID as is usually done everywhere else (e.g. struct vlan_hdr), clarifies documentation and updates all relevant code. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Summary of changes for PMDs that implement ETH, VLAN or E_TAG pattern items: - bnxt: EtherType matching is supported with and without VLAN, but TPID matching is not and triggers an error. - e1000: EtherType matching is only supported with the ETHERTYPE filter, which does not support VLAN matching, therefore no impact. - enic: same as bnxt. - i40e: same as bnxt with existing FDIR limitations on allowed EtherType values. The remaining filter types (VXLAN, NVGRE, QINQ) do not support EtherType matching. - ixgbe: same as e1000, with additional minor change to rely on the new E-Tag macro definition. - mlx4: EtherType/TPID matching is not supported, no impact. - mlx5: same as bnxt. - mvpp2: same as bnxt. - sfc: same as bnxt. - tap: same as bnxt. Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Fixes: `99e7003831` ("net/ixgbe: parse L2 tunnel filter") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	18aee2861a	ethdev: add encap level to RSS flow API action RSS hash types (ETH_RSS_* macros defined in rte_ethdev.h) describe the protocol header fields of a packet that must be taken into account while computing RSS. When facing encapsulated (e.g. tunneled) packets, there is an ambiguity as to whether these should apply to inner or outer packets. Applications need the ability to tell exactly "where" RSS must be performed. This is addressed by adding encapsulation level information to the RSS flow action. Its default value is 0 and stands for the usual unspecified behavior. Other values provide a specific encapsulation level. Contrary to the change announced by commit `676b605182` ("doc: announce ethdev API change for RSS configuration"), this patch does not affect struct rte_eth_rss_conf but struct rte_flow_action_rss as the former is not used anymore by the RSS flow action. ABI impact is therefore limited to rte_flow. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	929e331934	ethdev: add hash function to RSS flow API action By definition, RSS involves some kind of hash algorithm, usually Toeplitz. Until now it could not be modified on a flow rule basis and PMDs had to always assume RTE_ETH_HASH_FUNCTION_DEFAULT, which remains the default behavior when unspecified (0). This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:54 +01:00
Adrien Mazarguil	ac8d22de23	ethdev: flatten RSS configuration in flow API Since its inception, the rte_flow RSS action has been relying in part on external struct rte_eth_rss_conf for compatibility with the legacy RSS API. This structure lacks parameters such as the hash algorithm to use, and more recently, a method to tell which layer RSS should be performed on [1]. Given struct rte_eth_rss_conf will never be flexible enough to represent a complete RSS configuration (e.g. RETA table), this patch supersedes it by extending the rte_flow RSS action directly. A subsequent patch will add a field to use a non-default RSS hash algorithm. To that end, a field named "types" replaces the field formerly known as "rss_hf" and standing for "RSS hash functions" as it was confusing. Actual RSS hash function types are defined by enum rte_eth_hash_function. This patch updates all PMDs and example applications accordingly. It breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() [1] commit `676b605182` ("doc: announce ethdev API change for RSS configuration") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	19b3bc47c6	ethdev: fix C99 flexible arrays from flow API This patch replaces C99-style flexible arrays in struct rte_flow_action_rss and struct rte_flow_item_raw with standard pointers to the same data. They proved difficult to use in the field (e.g. no possibility of static initialization) and unsuitable for C++ applications. Affected PMDs and examples are updated accordingly. This breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	cc17feb904	ethdev: alter behavior of flow API actions This patch makes the following changes to flow rule actions: - List order now matters, they are redefined as performed first to last instead of "all simultaneously". - Repeated actions are now supported (e.g. specifying QUEUE multiple times now duplicates traffic among them). Previously only the last action of any given kind was taken into account. - No more distinction between terminating/non-terminating/meta actions. Flow rules themselves are now defined as always terminating unless a PASSTHRU action is specified. These changes alter the behavior of flow rules in corner cases in order to prepare the flow API for actions that modify traffic contents or properties (e.g. encapsulation, compression) and for which order matter when combined. Previously one would have to do so through multiple flow rules by combining PASSTRHU with priority levels, however this proved overly complex to implement at the PMD level, hence this simpler approach. This breaks ABI compatibility for the following public functions: - rte_flow_create() - rte_flow_validate() PMDs with rte_flow support are modified accordingly: - bnxt: no change, implementation already forbids multiple actions and does not support PASSTHRU. - e1000: no change, same as bnxt. - enic: modified to forbid redundant actions, no support for default drop. - failsafe: no change needed. - i40e: no change, implementation already forbids multiple actions. - ixgbe: same as i40e. - mlx4: modified to forbid multiple fate-deciding actions and drop when unspecified. - mlx5: same as mlx4, with other redundant actions also forbidden. - sfc: same as mlx4. - tap: implementation already complies with the new behavior except for the default pass-through modified as a default drop. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	2e6e75679a	ethdev: remove DUP action from flow API Upcoming changes in relation to the handling of actions list will make the DUP action redundant as specifying several QUEUE actions will achieve the same behavior. Besides, no PMD implements this action. By removing an entry from enum rte_flow_action_type, this patch breaks ABI compatibility for the following public functions: - rte_flow_copy() - rte_flow_create() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	9f995def55	ethdev: clarify flow API pattern items and actions Although pattern items and actions examples end with "and so on", these lists include all existing definitions and as a result are updated almost every time new types are added. This is cumbersome and pointless. This patch also synchronizes Doxygen and external API documentation wording with a slight clarification regarding meta pattern items. No fundamental API change. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Adrien Mazarguil	7eb312e342	ethdev: add error types to flow API These enable more precise reporting of objects responsible for errors. This breaks ABI compatibility for the following public functions: - rte_flow_create() - rte_flow_destroy() - rte_flow_error_set() - rte_flow_flush() - rte_flow_isolate() - rte_flow_query() - rte_flow_validate() Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-27 18:00:53 +01:00
Zijie Pan	3a18c44b45	ethdev: add access to EEPROM add new APIs: - rte_eth_dev_get_module_info - rte_eth_dev_get_module_eeprom Signed-off-by: Zijie Pan <zijie.pan@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Remy Horton <remy.horton@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 18:00:53 +01:00
Thomas Monjalon	fa47405cc1	ethdev: remove experimental flag of ports enumeration The basic operations for ports enumeration should not be considered as experimental in DPDK 18.05. The iterator RTE_ETH_FOREACH_DEV was introduced in DPDK 17.05. It uses the function the rte_eth_find_next_owned_by() to get only ownerless ports. Its API can be considered stable. So the flag experimental is removed from rte_eth_find_next_owned_by(). The flag experimental is removed from rte_eth_dev_count_avail() which is the new name of the old function rte_eth_dev_count(). The flag experimental is set to rte_eth_dev_count_total() in the .c file for consistency with the declaration in the .h file. A lot of internal applications are fixed to not allow experimental API. Fixes: `8728ccf376` ("fix ethdev ports enumeration") Fixes: `d9a42a69fe` ("ethdev: deprecate port count function") Fixes: `e70e26861e` ("net/mvpp2: fix build") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: David Marchand <david.marchand@6wind.com>	2018-04-27 18:00:24 +01:00
Qi Zhang	cac923cfea	ethdev: support runtime queue setup It's not possible to setup a queue when the port is started because of a check in ethdev layer. New capability flags are added in order to relax this check for devices which support queue setup in runtime. The functions rte_eth_[rx\|tx]_queue_setup will raise an error only if the port is started and runtime setup of queue is not supported. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 17:34:42 +01:00
Xueming Li	5355f4439e	ethdev: introduce generic IP/UDP tunnel checksum and TSO This patch introduce new TX offload flags for device that supports IP or UDP tunneled packet L3/L4 checksum and TSO offload. It will be used for non-standard tunnels. The support from the device is for inner and outer checksums on IPV4/TCP/UDP and TSO for any packet with the following format: <some headers> / [optional IPv4/IPv6] / [optional TCP/UDP] / <some headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP] For example the following packets can use this feature: 1. eth / ipv4 / udp / VXLAN / ip / tcp 2. eth / ipv4 / GRE / MPLS / ipv4 / udp Please note that specific tunnel headers that contain payload length, sequence id or checksum will not be updated. Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-27 17:34:41 +01:00
Xueming Li	8863a1fbfc	ethdev: add supported hash function check Add supported RSS hash function check in device configuration to have better error verbosity for application developers. Signed-off-by: Xueming Li <xuemingl@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 17:34:41 +01:00
Didier Pallard	8f0e4d6a78	net: export IPv6 header extensions skip function skip_ip6_ext function can be exported as a helper, it may be used by some PMD to skip IPv6 header extensions. Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Yong Wang <yongwang@vmware.com>	2018-04-27 17:34:41 +01:00
Adrien Mazarguil	06dbc8de05	ethdev: fix missing include in flow API Fixes: `b1a4b4cbc0` ("ethdev: introduce generic flow API") Cc: stable@dpdk.org Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 15:54:56 +01:00
Adrien Mazarguil	972bf36106	ethdev: fix shallow copy of flow API RSS action The rss_conf field is defined as a pointer to struct rte_eth_rss_conf. Even assuming it is permanently allocated and a pointer copy is safe, pointed data may change and not reflect an applied flow rule anymore. This patch aligns with testpmd by making a deep copy instead. Fixes: `18da437b5f` ("ethdev: add flow rule copy function") Cc: stable@dpdk.org Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-04-27 15:54:56 +01:00
Shahaf Shuler	4b954bb167	ethdev: remove new to old offloads API helpers According to commit `315ee8374e` ("doc: reduce initial offload API rework scope to drivers") All PMDs should have moved to the new offloads API. Therefore it is safe to remove the new->old convert helps. The old->new helpers will remain to support application which still use the old API. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-27 15:54:55 +01:00
Xiao Wang	ea2dc10668	vfio: add multi container support This patch adds APIs to support container create/destroy and device bind/unbind with a container. It also provides API for IOMMU programing on a specified container. A driver could use "rte_vfio_container_create" helper to create a new container from eal, use "rte_vfio_container_group_bind" to bind a device to the newly created container. During rte_vfio_setup_device the container bound with the device will be used for IOMMU setup. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 15:54:55 +01:00
Xiao Wang	340b7bb8d5	vfio: extend data structure for multi container Currently eal vfio framework binds vfio group fd to the default container fd during rte_vfio_setup_device, while in some cases, e.g. vDPA (vhost data path acceleration), we want to put vfio group to a separate container and program IOMMU via this container. This patch extends the vfio_config structure to contain per-container user_mem_maps and defines an array of vfio_config. The next patch will base on this to add container API. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-27 15:54:55 +01:00
Maxime Coquelin	996096e629	vhost/crypto: fix build with gcc 4.7.2 Build error has been reported by Intel build system: SUSE12SP3_64 / Linux 3.7.10-1 / GCC 4.7.2 lib/librte_vhost/vhost_crypto.c: In function ‘rte_vhost_crypto_set_zero_copy’: lib/librte_vhost/vhost_crypto.c:1192:2: error: comparison of unsigned expression < 0 is always false As enums can be either signed or unsigned, this patch removes the negative check and cast to unsigned the upper limit check. Fixes: `939066d965` ("vhost/crypto: add public function implementation") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 11:31:39 +02:00
Thomas Monjalon	a5c9b9278c	eal: fix build on FreeBSD The auxiliary vector read is implemented only for Linux. It could be done with procstat_getauxv() for FreeBSD. Since the commit below, the auxiliary vector functions are compiled for every architectures, including x86 which is tested with FreeBSD. This patch is moving the Linux implementation in Linux directory, and adding a fake/empty implementation for FreeBSD. Fixes: `2ed9bf3307` ("eal: abstract away the auxiliary vector") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 11:13:59 +02:00
Thomas Monjalon	8ddd6a90ea	eal: fix build with glibc < 2.16 The fake getauxval function does not use its parameter. So the compiler raised this error: lib/librte_eal/common/eal_common_cpuflags.c:25:25: error: unused parameter 'type' Fixes: `2ed9bf3307` ("eal: abstract away the auxiliary vector") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-27 11:12:53 +02:00
Artem V. Andreev	8a80fa4723	mempool: support block dequeue operation If mempool manager supports object blocks (physically and virtual contiguous set of objects), it is sufficient to get the first object only and the function allows to avoid filling in of information about each block member. Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-26 23:34:07 +02:00
Artem V. Andreev	a5beddd800	mempool: implement abstract mempool info API Primarily, it is intended as a way for the mempool driver to provide additional information on how it lays up objects inside the mempool. Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-26 23:34:07 +02:00
Stephen Hemminger	06bfd1cbb4	eal: shut up warning about master lcore This message looks suspicious and seen on healthy testpmd. EAL: WARNING: Master core has no memory on local socket! The message is wrong: the master lcore is 0 and its socket is 0 and there are multiple available memory segments on socket 0. At that point in the startup process, the count value is zero, meaning they are not used yet so the check_socket gets confused. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-26 17:40:21 +02:00
Anatoly Burakov	b0a1502a27	eal: make semantics of lcore role function more intuitive rte_lcore_has_role() returns 0 if role of lcore matches requested role. The return value of the API is confusing, and this is a known problem with a deprecation notice announcing the change to more intuitive semantics: Commit `064518f68d` ("doc: announce EAL API change to lcore role function") Implement changes announced in the deprecation notice, and remove it. Also, fix usages of this API to reflect the change. Control thread patches expected new behavior and were broken before, now they are fixed as well. Fixes: `d651ee4919` ("eal: set affinity for control threads") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-26 16:58:18 +02:00
Harry van Haaren	60df571197	service: remove experimental tags This commit removes the experimental tags from the service cores functions, they now become part of the main DPDK API/ABI. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 14:57:37 +02:00
Anatoly Burakov	4754ceaa09	eal/linux: remove useless unlock of hugepage when clearing Coverity was complaining about not checking result of call to fcntl() for unlocking the file. Disregarding the fact that error value returned from fcntl() unlock call is highly unlikely in the first place, we are subsequently calling close() on that same fd, which will drop the lock, which makes call to fcntl() unnecessary. Fix this by removing a call to fcntl() altogether. Coverity issue: 272607 Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 12:41:55 +02:00
Stephen Hemminger	7f0bb634a1	log: add ability to match log type with globbing Regular expressions are not the best way to match a hierarchical pattern like dynamic log levels. And the separator for dynamic log levels is period which is the regex wildcard character. A better solution is to use filename matching 'globbing' so that log levels match like file paths. For compatibility, use colon to separate pattern match style arguments. For example: --log-level 'pmd.net.virtio.*:debug' This also makes the documentation match what really happens internally. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-25 12:14:37 +02:00
Stephen Hemminger	fa20768905	eal: make log level save private We don't want format of eal log level saved values to be visible in ABI. Move to private storage in eal_common_log. Includes minor optimization. Compile the regular expression for each log match once, rather than each time it is used. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-25 12:12:19 +02:00
Stephen Hemminger	5690d37ddf	eal: allow symbolic log levels Much easeier to remember names than numbers. Allows --log-level=pmd.net.ixgbe.*,debug Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-25 12:11:47 +02:00
Stephen Hemminger	34a5c2db56	eal: make syslog facility table const The mapping for facility name to value can be const. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-25 12:10:52 +02:00
Aaron Conole	2ed9bf3307	eal: abstract away the auxiliary vector Rather than attempting to load the contents of the auxv directly, prefer to use an exposed API - and if that doesn't exist then attempt to load the vector. This is because on some systems, when a user is downgraded, the /proc/self/auxv file retains the old ownership and permissions. The original method of /proc/self/auxv is retained. This also removes a potential abort() in the code when compiled with NDEBUG. A quick parse of the code shows that many (if not all) of the CPU flag parsing isn't used internally, so it should be okay. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>	2018-04-25 04:29:00 +02:00
Gaetan Rivet	8b50041c0b	eal: add last init priority Add the priority RTE_PRIORITY_LAST, used for initialization routines meant to be run after all other constructors. This priority becomes the default priority for all DPDK constructors. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-04-25 04:18:11 +02:00
Gaetan Rivet	f779053ab3	eal: list acceptable init priorities Build a central list to quickly see each used priorities for constructors, allowing to verify that they are both above 100 and in the proper order. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-04-25 04:18:09 +02:00
Gaetan Rivet	b65ecf1993	devargs: rename legacy API The previous symbols were deprecated for two releases. They are now marked as such and cannot be used anymore. They are replaced by ones respecting the new namespace that are marked experimental. As a result, eth_dev attach and detach are slightly reworked to follow the changes. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 04:00:37 +02:00
Gaetan Rivet	8e6c3b795e	devargs: use proper namespace prefix rte_eal_devargs is useless, rte_devargs is sufficient. Only experimental functions are changed for now. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 04:00:22 +02:00
Gaetan Rivet	b629ab790c	devargs: update syntax documentation Device syntax documentation is out of date. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 03:58:49 +02:00
Gaetan Rivet	9e6b5ea992	devargs: make parsing variadic rte_eal_devargs_parse can be used by EAL subsystems, drivers, applications alike. Device parameters may be presented with different structure each time; as a single declaration string or several strings each describing different parts of the declaration. To simplify the use of this parsing facility, its parameters are made variadic. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 03:58:45 +02:00
Gaetan Rivet	c7b424c03d	devargs: make devargs list private Initially, rte_devargs was meant to be populated once and sometimes accessed, then never emptied. With the new hotplug functionality having better standing, new usage appeared with repeated addition of devices and their subsequent removal. Exposing devargs_list pushed bus drivers and libraries to be careless and inconsistent in their memory management. Making it private will allow to rationalize this part of the EAL and ensure that fewer memory leaks occur during operations. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 03:58:24 +02:00
Gaetan Rivet	e53e0fe0c2	devargs: introduce iterator In preparation to making devargs_list private. Bus drivers generally need to access rte_devargs pertaining to their operations. This match is a common operation for bus drivers. Add a new accessor for the rte_devargs list. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-25 03:57:51 +02:00
Olivier Matz	d651ee4919	eal: set affinity for control threads The management threads must not bother the dataplane or service cores. Set the affinity of these threads accordingly. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Olivier Matz	6383d2642b	eal: set name when creating a control thread To avoid code duplication, add a parameter to rte_ctrl_thread_create() to specify the name of the thread. This requires to add a wrapper for the thread start routine in rte_thread_init(), which will first wait that the thread is configured. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Olivier Matz	9e5afc72c9	eal: add function to create control threads Many parts of dpdk use their own management threads. Introduce a new wrapper for thread creation that will be extended in next commits to set the name and affinity. To be consistent with other DPDK APIs, the return value is negative in case of error, which was not the case for pthread_create(). Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Olivier Matz	dec7b1884a	use sizeof to avoid double use of a length define Only a cosmetic change: the *_LEN defines are already used when defining the buffer. Using sizeof() ensures that the length stays consistent, even if the definition is modified. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Jianfeng Tan	79967252c3	eal: bring forward multi-process channel init Adjust the init sequence: put mp channel init before bus scan so that we can init the vdev bus through mp channel in the secondary process before the bus scan. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>	2018-04-24 12:31:26 +02:00
Artem V. Andreev	eb8a86e275	mempool: support flushing the default cache Mempool get/put API cares about cache itself, but sometimes it is required to flush the cache explicitly. The function is moved in the file since it now requires rte_mempool_default_cache(). Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 02:17:43 +02:00
Andrew Rybchenko	05912855bc	mempool: remove callback to register memory area The callback is not required any more since there is a new callback to populate objects using provided memory area which provides the same information. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 02:17:43 +02:00
Andrew Rybchenko	fd943c764a	mempool: deprecate xmem functions Move rte_mempool_xmem_size() code to internal helper function since it is required in two places: deprecated rte_mempool_xmem_size() and non-deprecated rte_mempool_op_calc_mem_size_default(). Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 02:17:41 +02:00
Andrew Rybchenko	ce1f2c61ed	mempool: remove callback to get capabilities The callback was introduced to let generic code to know octeontx mempool driver requirements to use single physically contiguous memory chunk to store all objects and align object address to total object size. Now these requirements are met using a new callbacks to calculate required memory chunk size and to populate objects using provided memory chunk. These capability flags are not used anywhere else. Restricting capabilities to flags is not generic and likely to be insufficient to describe mempool driver features. If required in the future, API which returns structured information may be added. Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 02:16:12 +02:00
Andrew Rybchenko	e1174f2d53	mempool: add op to populate objects using provided memory The callback allows to customize how objects are stored in the memory chunk. Default implementation of the callback which simply puts objects one by one is available. Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 02:03:32 +02:00
Andrew Rybchenko	0a48646893	mempool: add op to calculate memory size to be allocated Size of memory chunk required to populate mempool objects depends on how objects are stored in the memory. Different mempool drivers may have different requirements and a new operation allows to calculate memory size in accordance with driver requirements and advertise requirements on minimum memory chunk size and alignment in a generic way. Bump ABI version since the patch breaks it. Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-24 02:02:58 +02:00
Artem V. Andreev	66e7ba0bad	mempool: ensure mempool is initialized before populating Callback to calculate required memory area size may require mempool driver data to be already allocated and initialized. Signed-off-by: Artem V. Andreev <artem.andreev@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 01:41:01 +02:00
Andrew Rybchenko	4143b12200	mempool: rename flag to control IOVA-contiguous objects Flag MEMPOOL_F_NO_PHYS_CONTIG is renamed as MEMPOOL_F_NO_IOVA_CONTIG to follow IO memory contiguous terminology. MEMPOOL_F_NO_PHYS_CONTIG is kept for backward compatibility and deprecated. Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 01:39:20 +02:00
Andrew Rybchenko	25e6755056	mempool: fix leak when no objects are populated Fixes: `84121f1971` ("mempool: store memory chunks in a list") Cc: stable@dpdk.org Suggested-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-24 01:28:53 +02:00
Jianfeng Tan	b8c835909e	ipc: fix timeout handling in async In original implementation, timeout event for an async request will be ignored. As a result, an async request will never trigger the action if it cannot receive any reply any more. We fix this by counting timeout as a processed reply. Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 22:45:05 +02:00
Jianfeng Tan	2147c09505	ipc: clean up code Following below commit, we change some internal function and variable names: commit `ce3a731235` ("eal: rename IPC request as synchronous one") Also use calloc to supersede malloc + memset for code clean up. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 22:44:26 +02:00
Anatoly Burakov	441d676777	ipc: fix resource leak in init failure Coverity issue: 272609 Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:25 +02:00
Anatoly Burakov	dd7b7f9a52	ipc: fix return without mutex unlock gettimeofday() returning a negative value is highly unlikely, but if it ever happens, we will exit without unlocking the mutex. Arguably at that point we'll have bigger problems, but fix this issue anyway. Coverity issue: 272595 Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:24 +02:00
Anatoly Burakov	505721e170	ipc: use strlcpy where applicable This also silences (or should silence) a few Coverity false positives where we used strcpy before (Coverity complained about not checking buffer size, but source buffers were always known to be sized correctly). Coverity issue: 260407, 272565, 272582 Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:23 +02:00
Anatoly Burakov	7508be4cce	fbarray: check sysconf failure sysconf() may return a negative value, check for it. Coverity issue: 272586 Fixes: `c44d09811b` ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:22 +02:00
Anatoly Burakov	f9a4f1b462	fbarray: fix potential null-dereference We get pointer to mask before we check if fbarray is NULL. Fix by moving getting mask pointer to until after NULL check. Coverity issue: 272579 Fixes: `c44d09811b` ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:21 +02:00
Anatoly Burakov	2bcbc4d12c	fbarray: check for open failure Coverity issue: 272564 Fixes: `c44d09811b` ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:21 +02:00
Anatoly Burakov	9d3ba1e0ad	fbarray: use strlcpy instead of snprintf Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:20 +02:00
Anatoly Burakov	2c8663f9d0	fbarray: make all fbarrays hidden files fbarray stores its data in a shared file, which is not hidden. This leads to polluting user's HOME directory with visible files when running DPDK as non-root. Change fbarray to always create hidden files by default. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-23 22:44:17 +02:00
Olivier Matz	b94fcf6bdf	cmdline: standardize conversion of IP address strings The code to convert IPv4 and IPv6 address strings into a binary format (inet_ntop) was included in the cmdline library because the DPDK was historically compiled in environments where the standard inet_ntop() function is not available. Today, this is not the case and the standard inet_ntop() can be used. This patch removes the internal inet_ntop*() functions and their specific license. There is a small functional impact: IP addresses like 012.34.56.78 are not valid anymore. Signed-off-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-23 21:31:40 +02:00
Xiao Wang	b3a022b17c	vfio: fix boundary check in region search A previously mapped region is skipped during the search, leading to DMA unmap fails. This patch fixes it and rewords the comment. Fixes: `73a6390859` ("vfio: allow to map other memory regions") Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 21:24:22 +02:00
Anoob Joseph	2f533cb325	security: extend userdata for IPsec events Extending 'userdata' to be used for IPsec events too. IPsec events would have some metadata which would uniquely identify the security session for which the event is raised. But application would need some construct which it can understand. The 'userdata' solves a similar problem for inline processed inbound traffic. Updating the documentation to extend the usage of 'userdata'. Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-04-23 18:20:10 +01:00
Anoob Joseph	807b94b851	security: add ESN soft limit in config Adding ESN soft limit in conf. This will be used in case of protocol offload. Per SA, application could specify for what ESN the security device need to notify application. In case of eth dev(inline protocol), rte_eth_event framework would raise an IPsec event. Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-04-23 18:20:10 +01:00
Anoob Joseph	3eeb0e5bdf	ethdev: support inline IPsec events Adding support for IPsec events in rte_eth_event framework. In inline IPsec offload, the per packet protocol defined variables, like ESN, would be managed by PMD. In such cases, PMD would need IPsec events to notify application about various conditions like, ESN overflow. Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-23 18:20:10 +01:00
Abhinandan Gujjar	2d96371fbd	cryptodev: support session private data setting The application may want to store private data along with the rte_cryptodev that is transparent to the rte_cryptodev layer. For e.g., If an eventdev based application is submitting a rte_cryptodev_sym_session operation and wants to indicate event information required to construct a new event that will be enqueued to eventdev after completion of the rte_cryptodev_sym_session operation. This patch provides a mechanism for the application to associate this information with the rte_cryptodev_sym_session session. The application can set the private data using rte_cryptodev_sym_session_set_private_data() and retrieve it using rte_cryptodev_sym_session_get_private_data(). Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-04-23 18:20:09 +01:00
Abhinandan Gujjar	54c8368466	cryptodev: set private data for session-less mode The application may want to store private data along with the rte_crypto_op that is transparent to the rte_cryptodev layer. For e.g., If an eventdev based application is submitting a crypto session-less operation and wants to indicate event information required to construct a new event that will be enqueued to eventdev after completion of the crypto operation. This patch provides a mechanism for the application to associate this information with the rte_crypto_op in session-less mode. Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-04-23 18:20:09 +01:00
Ravi Kumar	1df800f895	crypto/ccp: support SHA3 family Add SHA3 family authentication algorithm support for CCP crypto PMD. This patch defines new macros for SHA3 algorithms in the DPDK crypto framework. Signed-off-by: Ravi Kumar <ravi1.kumar@amd.com>	2018-04-23 18:20:09 +01:00
Fiona Trahe	f737f5cee6	cryptodev: change argument of driver registration Pass an rte_driver to the RTE_PMD_REGISTER_CRYPTO_DRIVER macro rather than an unspecified container which holds an rte_driver. All the macro actually needs is the rte_driver, not the container holding it. This paves the way for a later patch in which a driver will be registered which does not naturally derive from a container and so avoids having to create an arbitrary container to pass in the rte_driver. This patch changes the cryptodev lib macro and all the PMDs which use it. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-04-23 16:57:55 +01:00
Maxime Coquelin	9553e6e408	vhost: deprecate unsafe GPA translation API This patch marks rte_vhost_gpa_to_vva() as deprecated because it is unsafe. Application relying on this API should move to the new rte_vhost_va_from_guest_pa() API, and check returned length to avoid out-of-bound accesses. This issue has been assigned CVE-2018-1059. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 17:12:13 +02:00
Maxime Coquelin	0aee242841	vhost/crypto: move to safe GPA translation API This patch uses the new rte_vhost_va_from_guest_pa() API to ensure all the descriptor buffer is mapped contiguously in the application virtual address space. It does not handle buffers discontiguous in host virtual address space, but only return an error. This issue has been assigned CVE-2018-1059. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 17:12:13 +02:00
Maxime Coquelin	fb3815cc61	vhost: handle virtually non-contiguous buffers in Rx-mrg This patch enables the handling of buffers non-contiguous in process virtual address space in the enqueue path when mergeable buffers are used. When virtio-net header doesn't fit in a single chunck, it is computed in a local variable and copied to the buffer chuncks afterwards. For packet content, the copy length is limited to the chunck size, next chuncks VAs being fetched afterward. This issue has been assigned CVE-2018-1059. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 17:12:13 +02:00
Maxime Coquelin	6727f5a739	vhost: handle virtually non-contiguous buffers in Rx This patch enables the handling of buffers non-contiguous in process virtual address space in the enqueue path when mergeable buffers aren't used. When virtio-net header doesn't fit in a single chunck, it is computed in a local variable and copied to the buffer chuncks afterwards. For packet content, the copy length is limited to the chunck size, next chuncks VAs being fetched afterward. This issue has been assigned CVE-2018-1059. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 17:12:13 +02:00
Maxime Coquelin	91b7b40806	vhost: handle virtually non-contiguous buffers in Tx This patch enables the handling of buffers non-contiguous in process virtual address space in the dequeue path. When virtio-net header doesn't fit in a single chunck, it is copied into a local variablei before being processed. For packet content, the copy length is limited to the chunck size, next chuncks VAs being fetched afterward. This issue has been assigned CVE-2018-1059. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 17:12:13 +02:00
Maxime Coquelin	d0c24508e1	vhost: add support for non-contiguous indirect descs tables This patch adds support for non-contiguous indirect descriptor tables in VA space. When it happens, which is unlikely, a table is allocated and the non-contiguous content is copied into it. This issue has been assigned CVE-2018-1059. Reported-by: Yongji Xie <xieyongji@baidu.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 16:04:30 +02:00
Maxime Coquelin	30920b1e2b	vhost: ensure all range is mapped when translating QVAs This patch ensures that all the address range is mapped when translating addresses from master's addresses (e.g. QEMU host addressess) to process VAs. This issue has been assigned CVE-2018-1059. Reported-by: Yongji Xie <xieyongji@baidu.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 16:04:30 +02:00
Maxime Coquelin	41333fba5b	vhost: introduce safe API for GPA translation This new rte_vhost_va_from_guest_pa API takes an extra len parameter, used to specify the size of the range to be mapped. Effective mapped range is returned via len parameter. This issue has been assigned CVE-2018-1059. Reported-by: Yongji Xie <xieyongji@baidu.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 16:04:30 +02:00
Maxime Coquelin	070aceda33	vhost: check all range is mapped when translating GPAs There is currently no check done on the length when translating guest addresses into host virtual addresses. Also, there is no guanrantee that the guest addresses range is contiguous in the host virtual address space. This patch prepares vhost_iova_to_vva() and its callers to return and check the mapped size. If the mapped size is smaller than the requested size, the caller handle it as an error. This issue has been assigned CVE-2018-1059. Reported-by: Yongji Xie <xieyongji@baidu.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 16:04:30 +02:00
Maxime Coquelin	c6ae7de0de	vhost: fix indirect descriptors table translation size This patch fixes the size passed at the indirect descriptor table translation time, which is the len field of the descriptor, and not a single descriptor. This issue has been assigned CVE-2018-1059. Fixes: `62fdb8255a` ("vhost: use the guest IOVA to host VA helper") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-23 16:04:30 +02:00
Thomas Monjalon	91c6de7eb7	eal/linux: use strlcpy in uevent parsing Support of strlcpy has recently been added to DPDK. This replacement has been generated by the coccinelle script: devtools/cocci.sh devtools/cocci/strlcpy.cocci Fixes: `0d0f478d04` ("eal/linux: add uevent parse and process") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-23 16:23:15 +02:00
Thomas Monjalon	ab66fe0e4a	mbuf: improve tunnel Tx offloads API doc Add few details to remind TSO flag, checksum flags and header lengths. The doxygen syntax for MPLS-in-UDP is fixed. Fixes: `d95188551f` ("mbuf: introduce new Tx offload flag for MPLS-in-UDP") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-23 16:09:23 +02:00
Thomas Monjalon	f00dcb7b0a	mbuf: fix Tx checksum offload API doc When introducing rte_eth_tx_prepare(), the constraints on checksum pre-filling for Tx offloads were relaxed because implemented in the PMDs with rte_net_intel_cksum_flags_prepare() helper. As a consequence, these old requirements are removed for: - PKT_TX_OUTER_IP_CKSUM - PKT_TX_IP_CKSUM - PKT_TX_[L4]_CKSUM - PKT_TX_TCP_SEG Not sure SCTP offload is properly implemented though. A reference to rte_eth_tx_prepare() is added in rte_eth_tx_burst() doc. Fixes: `609dd68ef1` ("mbuf: enhance the API documentation of offload flags") Fixes: `4fb7e803eb` ("ethdev: add Tx preparation") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-23 12:54:51 +02:00
Bruce Richardson	40db28c187	rawdev: add to meson build Add librte_rawdev to the meson build of DPDK. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-04-17 16:40:09 +02:00
Bruce Richardson	629dbf2aa3	build: remove checks for non-optional libraries Unless a library cannot be built for a specific platform (generally BSD), it will always be available. Therefore remove checks for IP fragmentation and ACL libraries, since these are built for all platforms. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-17 16:09:43 +02:00
Pablo de Lara	34345a9b69	eventdev: fix build with icc ICC complains about variable being used before its value is set. Since the variable is only assigned in the for loop, its declaration is moved inside and is initialized. lib/librte_eventdev/rte_event_timer_adapter.c(708): error #592: variable "ret" is used before its value is set RTE_SET_USED(ret); Fixes: `6750b21bd6` ("eventdev: add default software timer adapter") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>	2018-04-19 13:42:59 +02:00
Yangchao Zhou	fb338b80e5	mem: fix leaks of hugedir and replace snprintf The hugedir returned by get_hugepage_dir is allocated by strdup but not released. Replace snprintf with a more suitable strlcpy. Coverity issue: 272585 Fixes: `cb97d93e9d` ("mem: share hugepage info primary and secondary") Signed-off-by: Yangchao Zhou <zhouyates@gmail.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-18 10:58:10 +02:00
Junjie Chen	1c9467a6ef	eal/x86: force inlining of memcpy sub-functions Sometimes gcc does not inline the function despite keyword inline, we observe rte_movX is not inline when doing performance profiling, so use always_inline keyword to force gcc to inline the function. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-18 09:22:56 +02:00
Jianfeng Tan	660098d61f	pdump: use generic multi-process channel The original code replies on the private channel for primary and secondary communication. Change to use the generic multi-process channel. Note with this change, dpdk-pdump will be not compatible with old version DPDK applications. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com>	2018-04-18 01:26:21 +02:00
Jianfeng Tan	83a73c5fef	vfio: use generic multi-process channel Previously, vfio uses its own private channel for the secondary process to get container fd and group fd from the primary process. This patch changes to use the generic mp channel. Test: 1. Bind two NICs to vfio-pci. 2. Start the primary and secondary process. $ (symmetric_mp) -c 2 -- -p 3 --num-procs=2 --proc-id=0 $ (symmetric_mp) -c 4 --proc-type=auto -- -p 3 \ --num-procs=2 --proc-id=1 Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-18 01:26:06 +02:00
Thomas Monjalon	d9a42a69fe	ethdev: deprecate port count function Some DPDK applications wrongly assume these requirements: - no hotplug, i.e. ports are never detached - all allocated ports are available to the application Such application iterates over ports by its own mean. The most common pattern is to request the port count and assume ports with index in the range [0..count[ can be used. In order to fix this common mistake in all external applications, the function rte_eth_dev_count is deprecated, while introducing the new functions rte_eth_dev_count_avail and rte_eth_dev_count_total. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-18 00:48:41 +02:00
Thomas Monjalon	a9dbe18022	fix ethdev port id validation Some DPDK applications wrongly assume these requirements: - no hotplug, i.e. ports are never detached - all allocated ports are available to the application Such application assume a valid port index is in the range [0..count[. There are three consequences when using such wrong design: - new ports having an index higher than the port count won't be valid - old ports being detached (RTE_ETH_DEV_UNUSED) can be valid Such mistake will be less common with growing hotplug awareness. All applications and examples inside this repository - except testpmd - must be fixed to use the function rte_eth_dev_is_valid_port. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-18 00:37:05 +02:00
Thomas Monjalon	8728ccf376	fix ethdev ports enumeration Some DPDK applications wrongly assume these requirements: - no hotplug, i.e. ports are never detached - all allocated ports are available to the application Such application iterates over ports by its own mean. The most common pattern is to request the port count and assume ports with index in the range [0..count[ can be used. There are three consequences when using such wrong design: - new ports having an index higher than the port count won't be seen - old ports being detached (RTE_ETH_DEV_UNUSED) can be seen as ghosts - failsafe sub-devices (RTE_ETH_DEV_DEFERRED) will be seen by the application Such mistake will be less common with growing hotplug awareness. All applications and examples inside this repository - except testpmd - must be fixed to use the iterator RTE_ETH_FOREACH_DEV. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-18 00:25:27 +02:00
Olivier Matz	a3d6026711	ring: relax alignment constraint on ring structure The initial objective of commit `d9f0d3a1ff` ("ring: remove split cacheline build setting") was to add an empty cache line between the producer and consumer data (on platform with cache line size = 64B), preventing from having them on adjacent cache lines. Following discussion on the mailing list, it appears that this also imposes an alignment constraint that is not required. This patch removes the extra alignment constraint and adds the empty cache lines using padding fields in the structure. The size of rte_ring structure and the offset of the fields remain the same on platforms with cache line size = 64B: rte_ring = 384 rte_ring.name = 0 rte_ring.flags = 32 rte_ring.memzone = 40 rte_ring.size = 48 rte_ring.mask = 52 rte_ring.prod = 128 rte_ring.cons = 256 But it has an impact on platform where cache line size is 128B: rte_ring = 384 -> 768 rte_ring.name = 0 rte_ring.flags = 32 rte_ring.memzone = 40 rte_ring.size = 48 rte_ring.mask = 52 rte_ring.prod = 128 -> 256 rte_ring.cons = 256 -> 512 Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>	2018-04-18 00:24:22 +02:00
Adrien Mazarguil	6b298c6285	eal: fix signed integers in fbarray While debugging startup issues encountered with Clang (see "eal: fix undefined behavior in fbarray"), I noticed that fbarray stores indices, sizes and masks on signed integers involved in bitwise operations. Such operations almost invariably cause undefined behavior with values that cannot be represented by the result type, as is often the case with bit-masks and left-shifts. This patch replaces them with unsigned integers as a safety measure and promotes a few internal variables to larger types for consistency. Coverity issue: 272598, 272599 Fixes: `c44d09811b` ("eal: add shared indexed file-backed array") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 14:38:16 +02:00
Adrien Mazarguil	f2e5e85824	eal: fix undefined behavior in fbarray According to GCC documentation [1], the __builtin_clz() family of functions yield undefined behavior when fed a zero value. There is one instance in the fbarray code where this can occur. Clang (at least version 3.8.0-2ubuntu4) seems much more sensitive to this than GCC and yields random results when compiling optimized code, as shown below: #include <stdio.h> int main(void) { volatile unsigned long long moo; int x; moo = 0; x = __builtin_clzll(moo); printf("%d\n", x); return 0; } $ gcc -O3 -o test test.c && ./test 63 $ clang -O3 -o test test.c && ./test 1742715559 $ clang -O0 -o test test.c && ./test 63 Even 63 can be considered an unexpected result given the number of leading zeroes should be the full width of the underlying type, i.e. 64. In practice it causes find_next_n() to sometimes return negative values interpreted as errors by caller functions, which prevents DPDK applications from starting due to inability to find free memory segments: # testpmd [...] EAL: Detected 32 lcore(s) EAL: Detected 2 NUMA nodes EAL: No free hugepages reported in hugepages-1048576kB EAL: Multi-process socket /var/run/.rte_unix EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list EAL: FATAL: Cannot init memory EAL: Cannot init memory PANIC in main(): Cannot init EAL 4: [./build/app/testpmd(_start+0x29) [0x462289]] 3: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f19d54fc830]] 2: [./build/app/testpmd(main+0x8a3) [0x466193]] 1: [./build/app/testpmd(__rte_panic+0xd6) [0x4efaa6]] Aborted This problem appears with commit `66cc45e293` ("mem: replace memseg with memseg lists") however the root cause is introduced by a prior patch. [1] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Fixes: `c44d09811b` ("eal: add shared indexed file-backed array") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 14:37:27 +02:00
Fan Zhang	b4ca812986	vhost/crypto: fix build without cryptodev Vhost-Crypto shall not be compiled if rte_cryptodev is disabled. This patch fix this by adding checking to Makefile. Fixes: d090c7f86a76 ("vhost/crypto: update makefile") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>	2018-04-17 12:36:40 +02:00
Anatoly Burakov	079527f069	malloc: fix not unlocking hotplug on fail to init We lock the hotplug during init, but do not unlock it if we couldn't register multiprocess callbacks. Add the missing unlock. Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 12:36:40 +02:00
Anatoly Burakov	48e9728898	ipc: fix missing mutex unlocks on failed send Earlier fix for race condition introduced a bug where mutex wasn't unlocked if message failed to be sent. Fix all of this by moving locking out of mp_request_sync() altogether. Fixes: `da5957821b` ("eal: fix race condition in IPC request") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 10:23:05 +02:00
Anatoly Burakov	7d863e253e	ipc: fix missing ignore message name We are trying to notify sender that response from current process should be ignored, but we didn't specify which request this response was for. Fix by copying request name from the original message. Fixes: `579a4ccc34` ("eal: ignore IPC messages until init is complete") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:45 +02:00
Anatoly Burakov	35ae44d1e2	ipc: fix use-after-free in asynchronous requests Previously, we were removing request from the list only if we have succeeded to send it. This resulted in leaving an invalid pointer in the request list. Fix this by only adding new requests to the request list if we have succeeded in sending them. Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:27 +02:00
Anatoly Burakov	fe98e52a52	ipc: fix use-after-free in synchronous requests Previously, we were adding synchronous requests to request list, we were doing it after checking if request existed. However, we only removed the request from the request list if we have succeeded in sending the request. In case of failed request send, we left an invalid pointer in the request list. Fix this by only adding request to the list once we succeed in sending it. Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:21 +02:00
Anatoly Burakov	2ae831fb42	ipc: stop async IPC loop on callback request EAL did not stop processing further asynchronous requests on encountering a request that should trigger the callback. This resulted in erasing valid requests but not triggering them. Fix this by stopping the loop once we have a request that can trigger the callback. Once triggered, we go back to scanning the request queue until there are no more callbacks to trigger. Fixes: `f05e26051c` ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:20 +02:00
Anatoly Burakov	6e8a721044	vfio: export functions even when disabled Previously, VFIO functions were not compiled in and exported if VFIO compilation was disabled. Fix this by actually compiling all of the functions unconditionally, and provide missing prototypes on Linux. Fixes: `279b581c89` ("vfio: expose functions") Fixes: `73a6390859` ("vfio: allow to map other memory regions") Fixes: `964b2f3bfb` ("vfio: export some internal functions") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-16 19:33:46 +02:00
Rami Rosen	6797eb65bf	eventdev: remove stale forward declaration This patch removes the decalartion of rte_eventdev_driver from rte_eventdev.h, as it not used anymore; pci_eventdev_skeleton_pmd moved to use rte_pci_driver instead of rte_eventdev_driver. Fixes: `7214438d93` ("eventdev: remove PCI dependency from generic structures") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <rami.rosen@intel.com>	2018-04-16 11:27:15 +02:00
Erik Gabriel Carrillo	6750b21bd6	eventdev: add default software timer adapter If an eventdev PMD does not wish to provide event timer adapter ops definitions, the library will fall back to a default software implementation whose entry points are added by this commit. Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo	eb54ef42b0	mk: update timer library order in static build The introduction of the event timer adapter library adds a dependency on the rte_timer library from the rte_eventdev library. Update the order so that the timer library comes after the eventdev library in the linker command when statically linking applications. Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo	47d05b2928	eventdev: add timer adapter common code This commit adds the logic that is shared by all event timer adapter drivers; the common code handles instance allocation and some initialization. Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo	4e8eed7e1f	eventdev: convert to SPDX license tag in header Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-16 11:04:46 +02:00
Erik Gabriel Carrillo	a6562f6d6f	eventdev: introduce event timer adapter Event devices can be coupled with various components to provide new event sources by using event adapters. The event timer adapter is one such adapter; it bridges event devices and timer mechanisms. This library extends the event-driven programming model by introducing a new type of event that represents a timer expiration, and it provides APIs with which adapters can be created or destroyed and event timers can be armed and canceled. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-16 11:04:46 +02:00
Mattias Rönnblom	463dee906e	eventdev: fix MP/MC tail updates in event ring rte_event_ring enqueue and dequeue tail updates were hardcoded for a SC/SP configuration. Fixes: `dc39e2f359` ("eventdev: add ring structure for events") Cc: stable@dpdk.org Signed-off-by: Mattias Rönnblom <hofors@lysator.liu.se> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-16 10:10:27 +02:00
Gage Eads	d593a8177f	eventdev: add device stop flush callback When an event device is stopped, it drains all event queues and ports. These events may contain pointers, so to prevent memory leaks eventdev now supports a user-provided flush callback that is called during the queue drain process. This callback is stored in process memory, so the callback must be registered by any process that may call rte_event_dev_stop(). This commit also clarifies the behavior of rte_event_dev_stop(). This follows this mailing list discussion: http://dpdk.org/ml/archives/dev/2018-January/087484.html Signed-off-by: Gage Eads <gage.eads@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-16 10:10:12 +02:00
Nikhil Rao	569758758d	eventdev: add Rx timestamp Add timestamp to received packets before enqueuing to event device if the timestamp is not already set. Adding timestamp in the Rx adapter avoids additional latency due to the event device. Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-16 10:04:58 +02:00
Junjie Chen	3f8ff12821	vhost: support interrupt mode In some cases we want vhost dequeue work in interrupt mode to release cpus to others when no data to transmit. So we install interrupt handler of vhost device and interrupt vectors for each rx queue when creating new backend according to vhost interrupt configuration. Thus, applications could register a epoll event fd to associate rx queues with interrupt vectors. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-14 00:43:30 +02:00
Olivier Matz	caccf8b318	ethdev: return diagnostic when setting MAC address Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a return code is added to notify the caller (librte_ether) if an error occurred in the PMD. The new default MAC address is now copied in dev->data->mac_addrs[0] only if the operation is successful. The patch also updates all the PMDs accordingly. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>	2018-04-14 00:43:30 +02:00
Fan Zhang	939066d965	vhost/crypto: add public function implementation This patch adds public API implementation to vhost crypto. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Fan Zhang	3bb595ecd6	vhost/crypto: add request handler This patch adds the implementation that parses virtio crypto request to dpdk crypto operation. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Fan Zhang	e80a987081	vhost/crypto: add session message handler This patch adds session message handler to vhost crypto. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Fan Zhang	136076ed72	vhost/crypto: add user message structure This patch adds virtio-crypto spec user message structure to vhost_user. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Fan Zhang	9b91fbd6ec	vhost/crypto: add vhost-user message handlers Previously, vhost library lacks the support to the vhost backend other than net such as adding private data or registering vhost-user message handlers. This patch fills the gap by adding data pointer and vhost-user pre and post message handlers to vhost library. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Jay Zhou	5303a48b53	vhost: add virtio crypto header file Since the linux kernel header file virtio_crypto.h has been merged in 4.9, if we include this header file directly, compilation will be failed in the old kernels' environment, e.g. the vhost crypto backend series. Adding virtio_crypto.h in librte_vhost to make old kernels happy. Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Lei Gong <arei.gonglei@huawei.com> Acked-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:43:30 +02:00
Remy Horton	3be82f5cc5	ethdev: support PMD-tuned Tx/Rx parameters The optimal values of several transmission & reception related parameters, such as burst sizes, descriptor ring sizes, and number of queues, varies between different network interface devices. This patch allows individual PMDs to specify preferred parameter values. Signed-off-by: Remy Horton <remy.horton@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-14 00:43:30 +02:00
Shahaf Shuler	b9bd0f09fa	ethdev: fix link status query When application works with LSC interrupts the ethdev layer skips the PMD callback and update according to the link status exists on device data. It is because it assumes the link status on the device data is the correct one since any link change is processed by the application. As multiple PMDs install the link status interrupt handler only on port start and uninstall it on port stop, the link status may be incorrect in case the query is called after port stop or before port start. Fixing the query implementation to use the PMD callback for such cases. Fixes: `b77d21cc23` ("ethdev: add link status get/set helper functions") Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-14 00:41:44 +02:00
Ferruh Yigit	cd8c7c7ce2	ethdev: replace bus specific struct with generic dev Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it although it is common for all ethdev in all buses. Replacing pci specific struct with generic device struct and updating places that are using pci device in a way to get this information from generic device. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: David Marchand <david.marchand@6wind.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-14 00:41:44 +02:00
Andrew Rybchenko	553af25720	ethdev: fix library version in meson build Fixes: `653e038efc` ("ethdev: remove versioning of filter control function") Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	bd2e0c3fe5	vhost: add APIs for live migration This patch adds APIs to enable live migration for non-builtin data paths. At src side, last_avail/used_idx from the device need to be set into the virtio_net structure, and the log_base and log_size from the virtio_net structure need to be set into the device. At dst side, last_avail/used_idx need to be read from the virtio_net structure and set into the device. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	07718b4f87	vhost: adapt library for selective datapath This patch adapts vhost lib for selective datapath by calling device ops at the corresponding stage. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	b4953225ce	vhost: add APIs for datapath configuration This patch adds APIs for datapath configuration. The did of the vhost-user socket can be set to identify the backend device, in this case each vhost-user socket can have only 1 connection. The did is set to -1 by default when the software datapath is used. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	d7280c9fff	vhost: support selective datapath This patch set introduces support for selective datapath in DPDK vhost-user lib. vDPA stands for vhost Data Path Acceleration. The idea is to support virtio ring compatible devices to serve virtio driver directly to enable datapath acceleration. A set of device ops is defined for device specific operations: a. get_queue_num: Called to get supported queue number of the device. b. get_features: Called to get supported features of the device. c. get_protocol_features: Called to get supported protocol features of the device. d. dev_conf: Called to configure the actual device when the virtio device becomes ready. e. dev_close: Called to close the actual device when the virtio device is stopped. f. set_vring_state: Called to change the state of the vring in the actual device when vring state changes. g. set_features: Called to set the negotiated features to device. h. migration_done: Called to allow the device to response to RARP sending. i. get_vfio_group_fd: Called to get the VFIO group fd of the device. j. get_vfio_device_fd: Called to get the VFIO device fd of the device. k. get_notify_area: Called to get the notify area info of the queue. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Zhihong Wang	2e28f45b69	vhost: export vhost feature definitions This patch exports vhost-user protocol features to support device driver development. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-04-14 00:40:21 +02:00
Shreyansh Jain	0da959d484	hash: fix comment for lookup rte_hash_lookup_with_hash() has wrong comment for its 'sig' param. Fixes: `1a9f648be2` ("hash: fix for multi-process apps") Cc: stable@dpdk.org Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-04-15 15:07:11 +02:00
Allain Legacy	4f512a1919	ip_frag: fix double free of chained mbufs The first mbuf and the last mbuf to be visited in the preceding loop are not set to NULL in the fragmentation table. This creates the possibility of a double free when the fragmentation table is later freed with rte_ip_frag_table_destroy(). Fixes: `95908f5239` ("ip_frag: free mbufs on reassembly table destroy") Cc: stable@dpdk.org Signed-off-by: Allain Legacy <allain.legacy@windriver.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-04-15 14:44:07 +02:00
Jeff Guo	0d0f478d04	eal/linux: add uevent parse and process In order to handle the uevent which has been detected from the kernel side, add uevent parse and process function to translate the uevent into device event, which user has subscribed to monitor. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 12:00:31 +02:00
Jeff Guo	a753e53d51	eal: add device event monitor framework This patch aims to add a general device event monitor framework at EAL device layer, for device hotplug awareness and actions adopted accordingly. It could also expand for all other types of device event monitor, but not in this scope at the stage. To get started, users firstly call below new added APIs to enable/disable the device event monitor mechanism: - rte_dev_event_monitor_start - rte_dev_event_monitor_stop Then users shell register or unregister callbacks through the new added APIs. Callbacks can be some device specific, or for all devices. -rte_dev_event_callback_register -rte_dev_event_callback_unregister Use hotplug case for example, when device hotplug insertion or hotplug removal, we will get notified from kernel, then call user's callbacks accordingly to handle it, such as detach or attach the device from the bus, and could benefit further fail-safe or live-migration. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 12:00:31 +02:00
Jeff Guo	493b8e173f	eal: add device event handle in interrupt thread Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for device event interrupt monitor. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 10:49:26 +02:00
Anatoly Burakov	08a20b3d37	vfio: fix device hotplug when several devices per group We only need to perform DMA mapping for first device in first group. At the time of mapping, we haven't yet added the device into the group, so the count is expected to be zero. Fixes: `810bfa64c6` ("vfio: fix index for tracking devices in a group") Fixes: `a9c349e3a1` ("vfio: fix device unplug when several devices per group") Fixes: `94c0776b1b` ("vfio: support hotplug") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-13 01:17:55 +02:00
Hemant Agrawal	964b2f3bfb	vfio: export some internal functions This patch moves some of the internal vfio functions from eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix. This patch also change the FSLMC bus usages from the internal VFIO functions to external ones with "rte_" prefix Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-13 01:06:57 +02:00
Hemant Agrawal	c94eb6db0a	doc: add VFIO API in doxygen Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-04-13 01:06:12 +02:00
Neil Horman	34fbfa585c	mem: set fd to -1 for anonymous mmap https://dpdk.org/tracker/show_bug.cgi?id=18 Indicated that several mmap call sites in the [linux\|bsd]app eal code set fd that was not -1 in their calls while using MAP_ANONYMOUS. While probably not a huge deal, the man page does say the fd should be -1 for portability, as some implementations don't ignore fd as they should for MAP_ANONYMOUS. Suggested-by: Solal Pirelli <solal.pirelli@gmail.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-12 14:44:24 +02:00
Olivier Matz	d27a626187	mbuf: remove control mbuf The rte_ctrlmbuf structure is not used by any example application in dpdk. Remove it, as announced on the mailing list. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-11 23:40:40 +02:00
Pavan Nikhilesh	7bdccb9307	eal: fix ARM build with clang Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8). The error was: include/generic/rte_atomic.h:215:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 include/generic/rte_atomic.h:494:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 include/generic/rte_atomic.h:772:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 Fixes: `ff2863570f` ("eal: introduce atomic exchange operation") Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-11 22:39:50 +02:00
Anatoly Burakov	6f63858e55	mem: prevent preallocated pages from being freed It is common sense to expect for DPDK process to not deallocate any pages that were preallocated by "-m" or "--socket-mem" flags - yet, currently, DPDK memory subsystem will do exactly that once it finds that the pages are unused. Fix this by marking pages as unfreebale, and preventing malloc from ever trying to free them. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	93723dd917	malloc: enable validation before new page allocation Before allocating a new page, give a chance to the user to allow or deny allocation via callbacks. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	2e378ff297	mem: add validator callback This API will enable application to register for notifications on page allocations that are about to happen, giving the application a chance to allow or deny the allocation when total memory utilization as a result would be above specified limit on specified socket. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	6b42f75632	eal: enable non-legacy memory mode Now that every other piece of the puzzle is in place, enable non-legacy init mode. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	43e4631371	vfio: support memory event callbacks Enable callbacks on first device attach, disable callbacks on last device attach. PPC64 IOMMU does memseg walk, which will cause a deadlock on trying to do it inside a callback, so provide a local, thread-unsafe copy of memseg walk. PPC64 IOMMU also may remap the entire memory map for DMA while adding new elements to it, so change user map list lock to a recursive lock. That way, we can safely enter rte_vfio_dma_map(), lock the user map list, enter DMA mapping function and lock the list again (for reading previously existing maps). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	76b15480d6	malloc: enable callbacks on alloc/free and mp sync Callbacks will be triggered just after allocation and just before deallocation, to ensure that memory address space referenced in the callback is always valid by the time callback is called. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	56efb4c117	malloc: support callbacks on memory events Each process will have its own callbacks. Callbacks will indicate whether it's allocation and deallocation that's happened, and will also provide start VA address and length of allocated block. Since memory hotplug isn't supported on FreeBSD and in legacy mem mode, it will not be possible to register them in either. Callbacks are called whenever something happens to the memory map of current process, therefore at those times memory hotplug subsystem is write-locked, which leads to deadlocks on attempt to use these functions. Document the limitation. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	07dcbfe010	malloc: support multiprocess memory hotplug This enables multiprocess synchronization for memory hotplug requests at runtime (as opposed to initialization). Basic workflow is the following. Primary process always does initial mapping and unmapping, and secondary processes always follow primary page map. Only one allocation request can be active at any one time. When primary allocates memory, it ensures that all other processes have allocated the same set of hugepages successfully, otherwise any allocations made are being rolled back, and heap is freed back. Heap is locked throughout the process, and there is also a global memory hotplug lock, so no race conditions can happen. When primary frees memory, it frees the heap, deallocates affected pages, and notifies other processes of deallocations. Since heap is freed from that memory chunk, the area basically becomes invisible to other processes even if they happen to fail to unmap that specific set of pages, so it's completely safe to ignore results of sync requests. When secondary allocates memory, it does not do so by itself. Instead, it sends a request to primary process to try and allocate pages of specified size and on specified socket, such that a specified heap allocation request could complete. Primary process then sends all secondaries (including the requestor) a separate notification of allocated pages, and expects all secondary processes to report success before considering pages as "allocated". Only after primary process ensures that all memory has been successfully allocated in all secondary process, it will respond positively to the initial request, and let secondary proceed with the allocation. Since the heap now has memory that can satisfy allocation request, and it was locked all this time (so no other allocations could take place), secondary process will be able to allocate memory from the heap. When secondary frees memory, it hides pages to be deallocated from the heap. Then, it sends a deallocation request to primary process, so that it deallocates pages itself, and then sends a separate sync request to all other processes (including the requestor) to unmap the same pages. This way, even if secondary fails to notify other processes of this deallocation, that memory will become invisible to other processes, and will not be allocated from again. So, to summarize: address space will only become part of the heap if primary process can ensure that all other processes have allocated this memory successfully. If anything goes wrong, the worst thing that could happen is that a page will "leak" and will not be available to neither DPDK nor the system, as some process will still hold onto it. It's not an actual leak, as we can account for the page - it's just that none of the processes will be able to use this page for anything useful, until it gets allocated from by the primary. Due to underlying DPDK IPC implementation being single-threaded, some asynchronous magic had to be done, as we need to complete several requests before we can definitively allow secondary process to use allocated memory (namely, it has to be present in all other secondary processes before it can be used). Additionally, only one allocation request is allowed to be submitted at once. Memory allocation requests are only allowed when there are no secondary processes currently initializing. To enforce that, a shared rwlock is used, that is set to read lock on init (so that several secondaries could initialize concurrently), and write lock on making allocation requests (so that either secondary init will have to wait, or allocation request will have to wait until all processes have initialized). Any other function that wishes to iterate over memory or prevent allocations should be using memory hotplug lock. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	1403f87d4f	malloc: enable memory hotplug support This set of changes enables rte_malloc to allocate and free memory as needed. Currently, it is disabled because legacy mem mode is enabled unconditionally. The way it works is, first malloc checks if there is enough memory already allocated to satisfy user's request. If there isn't, we try and allocate more memory. The reverse happens with free - we free an element, check its size (including free element merging due to adjacency) and see if it's bigger than hugepage size and that its start and end span a hugepage or more. Then we remove the area from malloc heap (adjusting element lengths where appropriate), and deallocate the page. For legacy mode, runtime alloc/free of pages is disabled. It is worth noting that memseg lists are being sorted by page size, and that we try our best to satisfy user's request. That is, if the user requests an element from a 2MB page memory, we will check if we can satisfy that request from existing memory, if not we try and allocate more 2MB pages. If that fails and user also specified a "size is hint" flag, we then check other page sizes and try to allocate from there. If that fails too, then, depending on flags, we may try allocating from other sockets. In other words, we try our best to give the user what they asked for, but going to other sockets is last resort - first we try to allocate more memory on the same socket. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	6167d81488	mem: add secondary process init with memory hotplug Secondary initialization will just sync memory map with primary process. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	cb97d93e9d	mem: share hugepage info primary and secondary Since we are going to need to map hugepages in both primary and secondary processes, we need to know where we should look for hugetlbfs mountpoints. So, share those with secondary processes, and map them on init. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	41519b9006	mem: make use of memory hotplug for init Add a new (non-legacy) memory init path for EAL. It uses the new memory hotplug facilities. If no -m or --socket-mem switches were specified, the new init will not allocate anything, whereas if those switches were passed, appropriate amounts of pages would be requested, just like for legacy init. Allocated pages will be physically discontiguous (or rather, they're not guaranteed to be physically contiguous - they may still be so by accident) unless RTE_IOVA_VA mode is used. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	b666f17858	mem: read hugepage counts from node-specific sysfs path For non-legacy memory init mode, instead of looking at generic sysfs path, look at sysfs paths pertaining to each NUMA node for hugepage counts. Note that per-NUMA node path does not provide information regarding reserved pages, so we might not get the best info from these paths, but this saves us from the whole mapping/remapping business before we're actually able to tell which page is on which socket, because we no longer require our memory to be physically contiguous. Legacy memory init will not use this. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	524e43c2ad	mem: prepare memseg lists for multiprocess sync In preparation for implementing multiprocess support, we are adding a version number to memseg lists. We will not need any locks, because memory hotplug will have a global lock (so any time memory map and thus version number might change, we will already be holding a lock). There are two ways of implementing multiprocess support for memory hotplug: either all information about mapped memory is shared between processes, and secondary processes simply attempt to map/unmap memory based on requests from the primary, or secondary processes store their own maps and only check if they are in sync with the primary process' maps. This implementation will opt for the latter option: primary process shared mappings will be authoritative, and each secondary process will use its own interal view of mapped memory, and will attempt to synchronize on these mappings using versioning. Under this model, only primary process will decide which pages get mapped, and secondary processes will only copy primary's page maps and get notified of the changes via IPC mechanism (coming in later commits). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00

... 3 4 5 6 7 ...

4528 Commits