numam-dpdk

Author	SHA1	Message	Date
Anatoly Burakov	74dbbcd6f8	ethdev: use contiguous allocation for DMA memory All hardware drivers should allocate IOVA-contiguous memzones for their hardware resources. This fixes the following drivers in one go: grep -Rl rte_eth_dma_zone_reserve drivers/ drivers/net/avf/avf_rxtx.c drivers/net/thunderx/nicvf_ethdev.c drivers/net/e1000/igb_rxtx.c drivers/net/e1000/em_rxtx.c drivers/net/fm10k/fm10k_ethdev.c drivers/net/vmxnet3/vmxnet3_rxtx.c drivers/net/liquidio/lio_rxtx.c drivers/net/i40e/i40e_rxtx.c drivers/net/sfc/sfc.c drivers/net/ixgbe/ixgbe_rxtx.c drivers/net/nfp/nfp_net.c Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:44:53 +02:00
Anatoly Burakov	23fa86e529	memzone: enable IOVA-contiguous reserving This adds a new flag to request reserved memzone to be IOVA contiguous. This is useful for allocating hardware resources like NIC rings/queues etc.For now, hugepage memory is always contiguous, but we need to prepare the drivers for the switch. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:44:05 +02:00
Anatoly Burakov	5ea85289a9	malloc: support contiguous allocation No major changes, just add some checks in a few key places, and a new parameter to pass around. Also, add a function to check malloc element for physical contiguousness. For now, assume hugepage memory is always contiguous, while non-hugepage memory will be checked. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:55 +02:00
Anatoly Burakov	d1162b77c9	malloc: replace panics with error messages We shouldn't ever panic in libraries, let alone in EAL, so replace all panic messages with error messages. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:50 +02:00
Anatoly Burakov	883179b493	malloc: make free return resulting element This will be needed because we need to know how big is the new empty space, to check whether we can free some pages as a result. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:43:41 +02:00
Anatoly Burakov	0a59238f80	malloc: make free list removal function public We will need to be able to remove entries from free lists from heaps during certain events, such as rollbacks, or when freeing memory to the system (where a previously element disappears and thus can no longer be in the free list). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:41:39 +02:00
Anatoly Burakov	f21aa4ec9d	malloc: make join elements function public Down the line, we will need to join free segments to determine whether the resulting contiguous free space is bigger than a page size, allowing to free some memory back to the system. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:38:08 +02:00
Anatoly Burakov	30bc6bf0d5	malloc: add function to dump heap contents Malloc heap is now a doubly linked list, so it's now possible to iterate over each malloc element regardless of its state. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:53 +02:00
Anatoly Burakov	bb372060da	malloc: make heap a doubly-linked list As we are preparing for dynamic memory allocation, we need to be able to handle holes in our malloc heap, hence we're switching to doubly linked list, and prepare infrastructure to support it. Since our heap is now aware where are our first and last elements, there is no longer any need to have a dummy element at the end of each heap, so get rid of that as well. Instead, let insert/remove/ join/split operations handle end-of-list conditions automatically. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:46 +02:00
Anatoly Burakov	b5dd92226f	malloc: move all locking to heap Down the line, we will need to do everything from the heap as any alloc or free may trigger alloc/free OS memory, which would involve growing/shrinking heap. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:37:39 +02:00
Anatoly Burakov	b7cc54187e	mem: move virtual area function in common directory Move get_virtual_area out of linuxapp EAL memory and make it common to EAL, so that other code could reserve virtual areas as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:33:06 +02:00
Anatoly Burakov	bef5a2d629	vfio: do not needlessly check for IOVA mode We already set IOVA addresses of memsegs and memzones to VA address during initialization, so we don't need to check whether we're in RTE_IOVA_VA mode anywhere else. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>	2018-04-11 02:18:19 +02:00
Anatoly Burakov	048303b6f3	mem: do not use physical addresses in IOVA as VA mode We already use VA addresses for IOVA purposes everywhere if we're in RTE_IOVA_VA mode: 1) rte_malloc_virt2phy()/rte_malloc_virt2iova() always return VA addresses 2) Because of 1), memzone's IOVA is set to VA address on reserve 3) Because of 2), mempool's IOVA addresses are set to VA addresses The only place where actual physical addresses are stored is in memsegs at init time, but we're not using them anywhere, and there is no external API to get those addresses (aside from manually iterating through memsegs), nor should anyone care about them in RTE_IOVA_VA mode. So, fix EAL initialization to allocate VA-contiguous segments at the start without regard for physical addresses (as if they weren't available), and use VA to set final IOVA addresses for all pages. Fixes: `62196f4e09` ("mem: rename address mapping function to IOVA") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>	2018-04-11 02:15:24 +02:00
Shahaf Shuler	5feecc57d9	align SPDX Mellanox copyrights Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-11 01:47:47 +02:00
Jan Viktorin	07cbe8f27d	eal/arm: use SPDX tag for Cavium and RehiveTech copyright file Replace the BSD license header with the SPDX tag for files with a RehiveTech and Cavium copyright on them. Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-04-11 01:47:46 +02:00
Jan Viktorin	27d8b82635	use SPDX tag for RehiveTech copyright files Replace the BSD license header with the SPDX tag for files with only an RehiveTech copyright on them. Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-04-11 01:47:43 +02:00
Pavan Nikhilesh	e166e55c1a	hash: fix missing spinlock unlock in add key Fix missing spinlock unlock during add key when key is already present. Fixes: `be856325cb` ("hash: add scalable multi-writer insertion with Intel TSX") Cc: stable@dpdk.org Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-04-10 23:35:40 +02:00
Jasvinder Singh	8173863865	table: remove incorrect check for ACL Remove wrong check for table entry pointer. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>	2018-04-04 12:26:20 +02:00
Jasvinder Singh	f0e352ddb0	pipeline: add port in action APIs This API provides a common set of actions for pipeline input ports to speed up application development. Each pipeline input port can be assigned an action handler to be executed on every input packet during the pipeline execution. The pipeline library allows the user to define his own input port actions by providing customized input port action handler. While the user can still follow this process, this API is intended to provide a quicker development alternative for a set of predefined actions. The typical steps to use this API are: * Define an input port action profile. * Instantiate the input port action profile to create input port action objects. * Use the input port action to generate the input port action handler invoked by the pipeline. * Use the input port action object to generate the internal data structures used by the input port action handler based on given action parameters. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:26:07 +02:00
Jasvinder Singh	934db41a31	pipeline: add load balance action Add implementation of the load balance action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:26 +02:00
Jasvinder Singh	2c3558c6bf	pipeline: add timestamp action Add implementation of timestamp action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:26 +02:00
Jasvinder Singh	394a0739c8	pipeline: add statistics read action Add implementation of stats read action Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:25 +02:00
Jasvinder Singh	625f4d4040	pipeline: add TTL update action Add implementation of ttl update action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:25 +02:00
Jasvinder Singh	a19dc6cd01	pipeline: add NAT action Add implementation of Network Address Translation(NAT) action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:25 +02:00
Jasvinder Singh	871f44164e	pipeline: add packet encapsulation action Add implementation of different type of packet encap such as vlan, qinq, mpls, pppoe, etc. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:24 +02:00
Jasvinder Singh	c8ae949197	pipeline: add traffic manager action Add implementation of traffic manager action. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:24 +02:00
Jasvinder Singh	7c9e5b9a12	pipeline: add traffic metering action Add traffic metering action implementation. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:23 +02:00
Jasvinder Singh	406a2bc0c6	pipeline: get table action params Add API to specify action related parameters such as action handler, table entry data size, etc. for the pipeline table. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:23 +02:00
Jasvinder Singh	654dd41112	pipeline: add table action APIs This API provides a common set of actions for pipeline tables to speed up application development. Each match-action rule added to a pipeline table has associated data that stores the action context. This data is input to the table action handler called for every input packet that hits the rule as part of the table lookup during the pipeline execution. The pipeline library allows the user to define his own table actions by providing customized table action handlers (table lookup) and complete freedom of setting the rules and their data (table rule add/delete). While the user can still follow this process, this API is intended to provide a quicker development alternative for a set of predefined actions. The typical steps to use this API are: * Define a table action profile. * Instantiate the table action profile to create table action objects. * Use the table action object to generate the pipeline table action handlers (invoked by the pipeline table lookup operation). * Use the table action object to generate the rule data (for the pipeline table rule add operation) based on given action parameters. * Use the table action object to read action data (e.g. stats counters) for any given rule. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-04-04 12:21:11 +02:00
Anatoly Burakov	952b207772	eal: provide API for querying valid socket ids During lcore scan, find all socket ID's and store them, and provide public API to query valid socket id's. This will break the ABI, so bump ABI version. Also, remove deprecation notice corresponding to this change. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-05 00:27:13 +02:00
Anatoly Burakov	f05e26051c	eal: add IPC asynchronous request This API is similar to the blocking API that is already present, but reply will be received in a separate callback by the caller (callback specified at the time of request, rather than registering for it in advance). Under the hood, we create a separate thread to deal with replies to asynchronous requests, that will just wait to be notified by the main thread, or woken up on a timer. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:47:59 +02:00
Anatoly Burakov	ce3a731235	eal: rename IPC request as synchronous one Rename rte_mp_request to rte_mp_request_sync to indicate that this request will be done synchronously (as opposed to asynchronous request, which comes in next patch). Also, fix alphabetical ordering for .map file. Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:32:21 +02:00
Anatoly Burakov	0891faf5d8	eal: rename IPC sync request to pending request Originally, there was only one type of request which was used for multiprocess synchronization (hence the name - sync request). However, now that we are going to have two types of requests, synchronous and asynchronous, having it named "sync request" is very confusing, so we will rename it to "pending request". This is internal-only, so no externally visible API changes. Suggested-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-04 23:30:32 +02:00
Stephen Hemminger	8ea081f381	mbuf: fix truncated strncpy Gcc-8 discovers issue with platform_mempool_ops. rte_mbuf_pool_ops.c:26:3: error: ‘strncpy’ output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] strncpy(mz->addr, ops_name, strlen(ops_name)); Since the ops_name is already checked for size, using strncpy here is unnecessary; just use strcpy. Fixes: `a3acc3144a` ("mbuf: add pool ops selection functions") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 17:34:20 +02:00
Remy Horton	255d42d5b6	metrics: fix potential missing string termination Fixes a potential memory overrun detected by Coverity. This overrun cannot currently happen in practice because rte_metrics_reg_names() explicitly forces the last name character to be a NULL terminator. This patches uses strlcpy instead of strncpy to copy name strings. Coverity issue: 143434 Fixes: `349950ddb9` ("metrics: add information metrics library") Fixes: `710cab6f67` ("metrics: fix out of bound access") Signed-off-by: Remy Horton <remy.horton@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-04-04 17:33:08 +02:00
Bruce Richardson	c022cb400e	convert snprintf to strlcpy Since we have support for the strlcpy function in DPDK, replace all instances where a string is copied using snprintf. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 17:33:08 +02:00
Bruce Richardson	5364de644a	eal: support strlcpy function The strncpy function is error prone for doing "safe" string copies, so we generally try to use "snprintf" instead in the code. The function "strlcpy" is a better alternative, since it better conveys the intention of the programmer, and doesn't suffer from the non-null terminating behaviour of it's n'ed brethern. The downside of this function is that it is not available by default on linux, though standard in the BSD's. It is available on most distros by installing "libbsd" package. This patch therefore provides the following in rte_string_fns.h to ensure that strlcpy is available there: * for BSD, include string.h as normal * if RTE_USE_LIBBSD is set, include <bsd/string.h> * if not set, fallback to snprintf for strlcpy Using make build system, the RTE_USE_LIBBSD is a hard-coded value to "n", but when using meson, it's automatically set based on what is available on the platform. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 17:33:08 +02:00
Pavan Nikhilesh	08f683174e	eal: add functions for previous power of 2 alignment Add 32b and 64b API's to align the given integer to the previous power of 2. Update common auto test to include test for previous power of 2 for both 32 and 64bit integers. Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-04 17:33:08 +02:00
Pavan Nikhilesh	5120203d75	eal: add macros to align value to multiple Add macros to align given value to the multiple of the supplied integer. Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-04 13:43:34 +02:00
Stephen Hemminger	2f35377892	mem: use z specifier to format size_t The recommended way to format size_t in printf is to use the z modifier which handles the case where size_t maybe 32 or 64 bits. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 13:43:33 +02:00
Stephen Hemminger	97d4aaffa7	pci: use z specifier to format size_t This addresses potential issues where size_t and off_t can vary on some platforms. For size_t the best way to format the value is to use the z modifier to printf. For off_t need to cast to long long to handle 64 bit offset on 32 bit platforms. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-04-04 13:43:33 +02:00
Jianfeng Tan	768274ebbd	vhost: avoid populate guest memory It's not necessary to populate guest memory from vhost side unless zerocopy is enabled or users want better performance. Update the doc for guest memory requirement clarification. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 17:25:45 +02:00
Tonghao Zhang	d64c43773a	vhost: add pipe event for optimizing negotiation When vhost-user connects qemu successfully, dpdk will call the vhost_user_add_connection to add unix socket fd to poll. And fdset_add only set the socket fd to a fdentry while poll may sleep now. In a general case, this is no problem. But if we use hot update for vhost-user, most downtime of VMs network is 750+ms. This patch adds pipe event, so after connections are ok, dpdk rebuild the poll immediately. With this patch, the most downtime is 20~30ms. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 17:25:45 +02:00
Tonghao Zhang	9426ee2678	vhost: move stdbool include The vhost.h file uses bool type, but not include stdbool header file. If other c files include vhost.h directly, there will be a compile error. This patch will be used in the next patch. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:44 +02:00
Tonghao Zhang	ce5bd5fcae	vhost: add fdset-event thread name This patch adds the name for vhost fdset thread. It can help us to know whether the thread is running. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-30 14:08:44 +02:00
Tonghao Zhang	2db2d3220b	vhost: raise error on fdset-thread creation When first call the 'rte_vhost_driver_start', the fdset_event_dispatch thread should be created successfully. Because the vhost uses it to poll socket events for vhost server or clients. Without it, for example, vhost will not get the connection event. This patch returns err code directly when created not successful. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-30 14:08:44 +02:00
Maxime Coquelin	394313fff3	vhost: avoid concurrency when logging dirty pages This patch aims at fixing a migration performance regression faced since atomic operation is used to log pages as dirty when doing live migration. Instead of setting a single bit by doing an atomic read-modify-write operation to log a page as dirty, this patch write 0xFF to the corresponding byte, and so logs 8 page as dirty. The advantage is that it avoids concurrent atomic operations by multiple PMD threads, the drawback is that some clean pages are marked as dirty and so are transferred twice. Fixes: `897f13a1f7` ("vhost: make page logging atomic") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-30 14:08:44 +02:00
Matan Azrad	f0e1180cb6	ethdev: fix port accessing after release rte_eth_dev_pci_release() function wrongly releases an ethdev port and then releases internal fields of this port. This behavior is problematic, because after the release, the port may be reallocated again by another thread or just be invalid for any usage. Move the release operation to the end of the function. Fixes: `dcd5c8112b` ("ethdev: add PCI driver helpers") Cc: stable@dpdk.org Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-03-30 14:08:44 +02:00
Tiwei Bie	7a36967029	vhost: do not generate signal when sendmsg fails More precisely, do not generate a SIGPIPE signal if the peer has closed the connection. Otherwise, it will terminate the process by default. As a library, we should avoid terminating the application process when error happens and just need to return with an error. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:44 +02:00
Tiwei Bie	71d93e9dd6	vhost: support sending fds via slave channel This function will be used to send fds to QEMU via slave channel. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:44 +02:00
Qi Zhang	239c9b435a	ethdev: fix queue start Device must be started before start any queue. Fixes: `0748be2cf9` ("ethdev: queue start and stop") Cc: stable@dpdk.org Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-03-30 14:08:44 +02:00
Ferruh Yigit	03d8f47100	ethdev: return named opaque type instead of void pointer "struct rte_eth_rxtx_callback" is defined as internal data structure and used as named opaque type. So the functions that are adding callbacks can return objects in this type instead of void pointer. Also const qualifier added to "struct rte_eth_rxtx_callback *" to protect it better from application modification. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2018-03-30 14:08:44 +02:00
Ferruh Yigit	0f5b98a56d	ethdev: remove unused struct forward declaration Fixes: `331c447ad9` ("ethdev: separate internal structures into own header") Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-03-30 14:08:44 +02:00
Ferruh Yigit	d64515065e	ethdev: support dynamic logging Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-03-30 14:08:44 +02:00
Ferruh Yigit	11c5d3411f	ethdev: fix port id storage port_id is now 16bits, update function parameter according. Fixes: `4c270218aa` ("ethdev: support security APIs") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-03-30 14:08:44 +02:00
Ivan Malov	b22e77c026	eal: register log type and pick level from args Dynamic log types are registered on RTE_INIT() step. This allows one to set log levels by EAL options on application launch. However, this does not allow to manage log types if they are created during runtime. EAL does not store log levels and types passed from the command line. Thus, they cannot be picked later. This is an obvious flaw since it would be better to be able to pick levels for dynamic types registered for runtime-determined facilities such as NIC ports. This patch provides a mechanism to store log levels passed from EAL options and adds an API to register log types and pick levels from the internal storage. Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Andy Moreton <amoreton@solarflare.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-30 14:08:44 +02:00
Stephen Hemminger	b77d21cc23	ethdev: add link status get/set helper functions Many drivers are all doing copy/paste of the same code to atomically update the link status. Reduce duplication, and allow for future changes by having common function for this. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-30 14:08:43 +02:00
Stephen Hemminger	ff2863570f	eal: introduce atomic exchange operation To handle atomic update of link status (64 bit), every driver was doing its own version using cmpset. Atomic exchange is a useful primitive in its own right; therefore make it a EAL routine. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-03-30 14:08:43 +02:00
Ilya Maximets	1cf62d9685	vhost: add note about sockets in server mode From time to time, someone sends patches about unlinking existing sockets when registering a vhost user in server mode. A recent example: http://dpdk.org/ml/archives/dev/2018-February/090025.html This problem has been discussed many times, and it was made clear that the library should not unlink files given by the application in order to avoid possible security problems, such as removing random files used by other programs. One of the first discussions: http://dpdk.org/ml/archives/dev/2015-December/030326.html To avoid such patches in the future, it was decided to add a comment that explains what is happening and tries to describe the reasoning. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:43 +02:00
Kirill Rybalchenko	653e038efc	ethdev: remove versioning of filter control function In 18.02 release the ABI of ethdev component was changed. To keep compatibility with previous versions of the library the versioning of rte_eth_dev_filter_ctrl function was implemented. As soon as deprecation note was issued in 18.02 release, there is no need to keep compatibility with previous versions. Remove the versioning of rte_eth_dev_filter_ctrl function. Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-30 14:08:43 +02:00
Mohammad Abdul Awal	b4572daa2c	ethdev: fix string length in name comparison The current code compares two strings upto the length of 1st string (searched name). If the 1st string is prefix of 2nd string (existing name), the string comparison returns the port_id of earliest prefix matches. This patch fixes the bug by using strcmp instead of strncmp. Fixes: `9c5b8d8b9f` ("ethdev: clean port id retrieval when attaching") Cc: stable@dpdk.org Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-03-30 14:08:43 +02:00
Zhiyong Yang	ee6c1f770b	flow_classify: remove void pointer cast Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-30 14:08:43 +02:00
Tomasz Kulasek	90bb22a197	vhost: fix ring index returned to master on stop According to the "Vhost-user Protocol" document, VHOST_USER_GET_VRING_BASE should get the available vring base offset. Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Tomasz Kulasek	06fc115977	vhost: fix log macro name conflict LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h is included the symbols conflict. This patch changes LOG_DEBUG to VHOST_LOG_DEBUG. Fixes: `1c01d52392` ("vhost: add debug print") Cc: stable@dpdk.org Signed-off-by: Ben Walker <benjamin.walker@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Jianfeng Tan	ae034edaa6	vhost: avoid function call in data path Previously, get_device() is a function call. It's OK for slow path configuration, but takes some cycles for data path. To avoid that, we turn this function to inline type. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Jianfeng Tan	bdf78f9f24	vhost: remove unused log constant Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Tomasz Kulasek	7afa2e4538	vhost: fix realloc failure When reallocation of guest pages fails, vhost_user_set_mem_table() also should fail. Fixes: `e246896178` ("vhost: get guest/host physical address mappings") Cc: stable@dpdk.org Signed-off-by: Ziye Yang <ziye.yang@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Tomasz Kulasek	ace7b6b785	vhost: fix device cleanup at stop This prevents from destroying & recreating user device in "incomplete" vring state. virtio_is_ready() was returning true for devices with vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL hasn't arrived yet) Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Tomasz Kulasek	aa001111b0	vhost: check cmsg not null Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Tomasz Kulasek	fbc4d248b1	vhost: fix offset while mmaping log base address QEMU always set offset to 0 but for sanity we should take the offset into account. Fixes: `54f9e32305` ("vhost: handle dirty pages logging request") Cc: stable@dpdk.org Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	0fe99cf73e	vhost: check overflow before mmap If memory_size + mmap_offset overflows then the memory region is bogus. Do not use the overflowed mmap_size value for mmap(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	eb7c574b21	vhost: validate virtqueue size Check the virtqueue size constraints so that invalid values don't cause bugs later on in the code. For example, sometimes the virtqueue size is stored as unsigned int and sometimes as uint16_t, so bad things happen if it is ever larger than 65535. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	55659ed3ed	vhost: fix message payload union in setting ring address vhost_user_set_vring_addr() uses the msg->payload.addr union member, not msg->payload.state. Luckily the offset of the 'index' field is identical in both structs, so there was never any buggy behavior. Fixes: `5cd690e4fd` ("vhost: fix vring addresses not translated") Cc: stable@dpdk.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	f83e0199c8	vhost: reject invalid log base mmap offset If the log base mmap_offset is larger than mmap_size then it points outside the mmap region. We must not write to memory outside the mmap region, so validate mmap_offset in vhost_user_set_log_base(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	2042cf7194	vhost: clear out unused SCM_RIGHTS file descriptors The number of file descriptors received is not stored by vhost_user.c. vhost_user_set_mem_table() assumes that memory.nregions matches the number of file descriptors received, but nothing guarantees this: for (i = 0; i < memory.nregions; i++) close(pmsg->fds[i]); Another questionable code snippet is: case VHOST_USER_SET_LOG_FD: close(msg.fds[0]); If not enough file descriptors were received then fds[] contains uninitialized data from the stack (see read_fd_message()). This might cause non-vhost file descriptors to be closed if the uninitialized data happens to match. Refactoring vhost_user.c to pass around and check the number of file descriptors everywhere would make the code more complex. It is simpler for read_fd_message() to set unused elements in fds[] to -1. This way close(-1) is called and no harm is done. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	4d490c7ce3	vhost: validate untrusted memory regions number field Check if memory.nregions is valid right away. This eliminates the possibility of bugs when memory.nregions is used later on in vhost_user_set_mem_table(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	cdc37ca3d0	vhost: avoid enum fields in VhostUserMsg The VhostUserMsg struct binary representation must match the vhost-user protocol specification since this struct is read from and written to the socket. The VhostUserMsg.request union contains enum fields. Enum binary representation is implementation-defined according to the C standard and it is unportable to make assumptions about the representation: 6.7.2.2 Enumeration specifiers ... Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration. Additionally, librte_vhost relies on the enum type being unsigned when validating untrusted inputs: if (ret <= 0 \|\| msg.request.master >= VHOST_USER_MAX) { If msg.request.master is signed then negative values pass this check! Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about portability, the actual enum constants still affect the final type. For example, if we add a negative constant then its type changes to signed int: typedef enum VhostUserRequest { ... VHOST_USER_INVALID = -1, }; This is very fragile and it's unlikely that anyone changing the code would remember this. A security hole can be introduced accidentally. This patch switches VhostUserMsg.request fields to uint32_t to avoid the portability and potential security issues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Stefan Hajnoczi	c45427a48e	vhost: add security model documentation Input validation is not applied consistently in vhost_user.c. This suggests that not everyone has the same security model in mind when working on the code. Make the security model explicit so that everyone can understand and follow the same model when modifying the code. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: John McNamara <john.mcnamara@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-03-30 14:08:42 +02:00
Keith Wiles	2a5002362d	kvargs: fix syntax in comments Use commas as separator, not semicolons. Fixes: `a8b97e3a1d` ("devargs: use a comma instead of semicolon to separate key/values") Cc: stable@dpdk.org Signed-off-by: Keith Wiles <keith.wiles@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-03-28 00:43:22 +02:00
Andrew Rybchenko	98940244ad	meter: fix library version in meson build Fixes: `c06ddf9698` ("meter: add configuration profile") Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	14cfdf428d	table: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	6c9e21a996	pdump: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	4119163cd2	mempool: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	b9fd9dd366	eventdev: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	54fe8912ac	cryptodev: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Andrew Rybchenko	5b1362d7f3	bitratestats: fix library version in meson build Fixes: `5b9656b157` ("lib: build with meson") Cc: stable@dpdk.org Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-28 00:07:35 +02:00
Ferruh Yigit	8f0b534b35	pci: remove duplicated symbol from map file Remove duplicated symbol rte_pci_device_name from .map file. Also sort the map file to be able to detect any possible duplication easier in the future. Fixes: `0e3ef055be` ("pci: fix namespace prefix of new functions") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-03-22 17:34:48 +01:00
Hemant Agrawal	acaa9ee991	move kernel modules directories This patch moves the kernel modules code from EAL to a common place. - Separate the kernel module code from user space code. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-21 23:04:21 +01:00
Anatoly Burakov	4fc90035af	vfio: fix headers for C++ support Fixes: `279b581c89` ("vfio: expose functions") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-03-21 18:49:37 +01:00
Anatoly Burakov	579a4ccc34	eal: ignore IPC messages until init is complete If we receive messages that don't have a callback registered for them, and we haven't finished initialization yet, it can be reasonably inferred that we shouldn't have gotten the message in the first place. Therefore, send requester a special message telling them to ignore response to this request, as if this process wasn't there. Since it is not possible for primary process to receive any messages during initialization, this change in practice only applies to secondary processes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	37e945d187	eal: simplify IPC sync request timeout Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	89f1fe7e6d	eal: lock IPC directory on init and send When sending IPC messages, prevent new sockets from initializing. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	a8075ad61e	eal: do not hardcode socket filter value in IPC Currently, filter value is hardcoded and disconnected from actual value returned by eal_mp_socket_path(). Fix this to generate filter value by deriving it from eal_mp_socket_path() instead. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	2e30c3fac4	eal: abstract away IPC socket path generation Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 18:42:39 +01:00
Anatoly Burakov	a99c96e96a	eal: add internal flag of init completed Currently, primary process initialization is finalized by setting the RTE_MAGIC value in the shared config. However, it is not possible to check whether secondary process initialization has completed. Add such a value to internal config. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-03-21 18:42:34 +01:00
Anatoly Burakov	da5957821b	eal: fix race condition in IPC request Unlocking the action list before sending message and locking it again afterwards introduces a window where a response might arrive before we have a chance to start waiting on a condition, resulting in timeouts on valid messages. Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 09:50:31 +01:00
Anatoly Burakov	139653a09a	eal: fix errno handling in IPC Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 09:50:29 +01:00
Anatoly Burakov	836c2ed0c0	eal: fix IPC request socket path Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 09:50:27 +01:00
Anatoly Burakov	bed4c1dfa9	eal: fix IPC socket path Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 09:50:25 +01:00
Anatoly Burakov	620952e060	eal: fix IPC timeout Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-03-21 09:50:23 +01:00

1 2 3 4 5 ...

4109 Commits