numam-dpdk

Author	SHA1	Message	Date
Tiwei Bie	e218fa09f4	vhost: fix desc access in relay helpers Descs in desc table should be indexed using the desc idx instead of the idx of avail ring and used ring. Fixes: `b13ad2decc` ("vhost: provide helpers for virtio ring relay") Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2019-01-14 17:44:29 +01:00
Ilya Maximets	9726aa9907	eal: fix build of external app with clang on armv8 In case DPDK built using GCC, RTE_TOOLCHAIN_CLANG is not defined. But 'rte_atomic.h' is a generic header that included to the external apps like OVS while building with DPDK. As a result, clang build of OVS fails on armv8 if DPDK built using gcc: include/generic/rte_atomic.h:215:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 include/generic/rte_atomic.h:494:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 include/generic/rte_atomic.h:772:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 We need to check for current compiler, not the compiler used for DPDK build. Fixes: `7bdccb9307` ("eal: fix ARM build with clang") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2019-01-14 19:49:48 +01:00
Yongseok Koh	bf78d4dc2b	mbuf: remove experimental tag for external attachment Remove the experimental tag of rte_pktmbuf_attach_extbuf() which was introduced in 18.05. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2019-01-14 16:37:59 +01:00
Yongseok Koh	952f4cf5f0	mbuf: remove deprecated macro RTE_MBUF_INDIRECT() is replaced with RTE_MBUF_CLONED() and removed. This macro was deprecated in release 18.05 when EXT_ATTACHED_MBUF was introduced. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2019-01-14 16:37:36 +01:00
Harry van Haaren	6af649a44c	mbuf: fix C++ compatibility by making sched struct visible Although C compilation works with the struct rte_mbuf_sched declared inside the struct rte_mbuf namespace, C++ fails to compile. This fix moves the rte_mbuf_sched struct up to the global namespace, instead of declaring it inside the struct mbuf namespace. The struct rte_mbuf_sched is being used on the stack in rte_mbuf_sched_get() and as a cast in _set(). For this reason, it must be exposed as an available type. Fixes: `5d3f721009` ("mbuf: implement generic format for sched field") Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2019-01-14 16:03:28 +01:00
Anatoly Burakov	ba07193e03	mem: fix storing old policy The original code was supposed to overwrite the value pointed to by the pointer, but the new one is instead overwriting the pointer value itself, which has no effect outside that function. Fix it by adding a pointer dereference. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:50:52 +01:00
Anatoly Burakov	199629022c	mem: fix variable shadowing A local variable ``flags`` was shadowing another variable from outer scope. Fix this by renaming the variable and make it const. Fixes: `c127be93f6` ("mem: support using memfd segments for in-memory mode") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:42:40 +01:00
Anatoly Burakov	c0f8d50d1c	vfio: do not unregister callback in secondary process Callbacks are only registered in the primary, so do not attempt to unregister callbacks in secondary processes. Fixes: `43e4631371` ("vfio: support memory event callbacks") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:31:51 +01:00
Anatoly Burakov	97257eee2d	eal/bsd: remove clean up of files at startup On FreeBSD, closing the file descriptor drops the lock even if the file descriptor was mmap'ed. This leads to the cleanup at the end of EAL init to remove fbarray files that are still in use by the process itself. However, instead of working around this issue, we can take advantage of the fact that FreeBSD doesn't really create any per-process files in the first place, so no cleanup is actually needed. Fixes: `0a529578f1` ("eal: clean up unused files on initialization") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:23:12 +01:00
Anatoly Burakov	66d9f61de0	eal: fix strdup usages in internal config Currently, we use strdup in a few places to store command-line parameter values for certain internal config values. There are several issues with that. First of all, they're never freed, so memory ends up leaking either after EAL exit, or when these command-line options are supplied multiple times. Second of all, they're defined as `const char `, so they cannot* be freed even if we wanted to. Finally, strdup may return NULL, which will be stored in the config. For most fields, NULL is a valid value, but for the default prefix, the value is always expected to be valid. To fix all of this, three things are done. First, we change the definitions of these values to `char ` as opposed to `const char `. This does not break the ABI, and previous code assumes constness (which is more restrictive), so it's safe to do so. Then, fix all usages of strdup to check return value, and add a cleanup function that will free the memory occupied by these strings, as well as freeing them before assigning a new value to prevent leaks when parameter is specified multiple times. And finally, add an internal API to query hugefile prefix, so that, absent of a valid value, a default value will be returned, and also fix up all usages of hugefile prefix to use this API instead of accessing hugefile prefix directly. Bugzilla ID: 108 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:05:19 +01:00
Konstantin Ananyev	b73cec26cd	ipsec: fix assert condition fix invalid RTE_ASSERT condition in rsn_update_finish() Fixes: `c0308cd895` ("ipsec: rework SA replay window/SQN for MT environment") Reported-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-01-14 14:45:02 +01:00
Bruce Richardson	efa8088663	build: fix variable name in dependency error message The variable name in the error message had an extra '_' which caused an actual meson error when the message would otherwise be printed to give meaningful information about what was going wrong. Fixes: `203b61dc5e` ("build: improve error message for missing dependency") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2019-01-14 12:24:57 +01:00
Konstantin Ananyev	f901d9c826	ipsec: add helpers to group completed crypto-ops Introduce helper functions to process completed crypto-ops and group related packets by sessions they belong to. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	c0308cd895	ipsec: rework SA replay window/SQN for MT environment With these changes functions: - rte_ipsec_pkt_crypto_prepare - rte_ipsec_pkt_process can be safely used in MT environment, as long as the user can guarantee that they obey multiple readers/single writer model for SQN+replay_window operations. To be more specific: for outbound SA there are no restrictions. for inbound SA the caller has to guarantee that at any given moment only one thread is executing rte_ipsec_pkt_process() for given SA. Note that it is caller responsibility to maintain correct order of packets to be processed. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	4d7ea3e145	ipsec: implement SA data-path API Provide implementation for rte_ipsec_pkt_crypto_prepare() and rte_ipsec_pkt_process(). Current implementation: - supports ESP protocol tunnel mode. - supports ESP protocol transport mode. - supports ESN and replay window. - supports algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL. - covers all currently defined security session types: - RTE_SECURITY_ACTION_TYPE_NONE - RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO - RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL - RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL For first two types SQN check/update is done by SW (inside the library). For last two type it is HW/PMD responsibility. Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	1e0ad1e36d	ipsec: add SA data-path API Introduce Security Association (SA-level) data-path API Operates at SA level, provides functions to: - initialize/teardown SA object - process inbound/outbound ESP/AH packets associated with the given SA (decrypt/encrypt, authenticate, check integrity, add/remove ESP/AH related headers and data, etc.). Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	9f7b43141c	lib: introduce IPsec library Introduce librte_ipsec library. The library is supposed to utilize existing DPDK crypto-dev and security API to provide application with transparent IPsec processing API. That initial commit provides some base API to manage IPsec Security Association (SA) object. Signed-off-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	19b08e5406	net: add ESP trailer structure definition define esp_tail structure. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Konstantin Ananyev	58a8e49a98	security: add opaque userdata pointer into security session Add 'uint64_t opaque_data' inside struct rte_security_session. That allows upper layer to easily associate some user defined data with the session. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com> Acked-by: Declan Doherty <declan.doherty@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	b1d978fc7b	cryptodev: add opaque data field to symmetric session This patch adds a opaque data field to cryptodev symmetric session. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	5d6c73dd59	cryptodev: add reference count to session private data This patch adds a refcnt field to every session private data in the cryptodev symmetric session. The counter is used to prevent freeing symmetric session blindly before it is not cleared by every type of crypto device in use. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	9e5f5ecb5e	cryptodev: add user data size to symmetric session This patch adds a user_data_sz field to cryptodev symmetric session. The field is used to check if reading or writing the session's user data field is eligible. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	e764cd72a9	cryptodev: update symmetric session structure This patch updates the rte_cryptodev_sym_session structure for cryptodev library. The updates include a changed session private data array and an added nb_drivers field. They are used to calculate the correct session header size and ensure safe access of the session private data. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	0b60386ac3	cryptodev: add sym session header size function This patch adds a new API in Cryptodev Framework. The API is used to get the header size for the created symmetric Cryptodev session. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	ac5e42daca	vhost/crypto: use separate session mempools This patch uses the two session mempool approach to vhost crypto. One mempool is for session header objects, and the other is for session private data. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	1d6f89885e	cryptodev: add sym session mempool create This patch adds a new API "rte_cryptodev_sym_session_pool_create()" to cryptodev library. All applications are required to use this API to create sym session mempool as it adds private data and nb_drivers information to the mempool private data. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Fan Zhang	725d2a7fbf	cryptodev: change queue pair configure structure This patch changes the cryptodev queue pair configure structure to enable two mempool passed into cryptodev PMD simutaneously. Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-01-10 16:57:22 +01:00
Eelco Chaudron	655796d2b5	meter: support RFC4115 trTCM This patch adds support for RFC4115 trTCM meters. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2019-01-10 00:34:09 +01:00
Thomas Monjalon	7637518249	version: 19.02-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-12-23 00:21:13 +01:00
Tonghao Zhang	03b7fd7e54	sched: fix memory leak on init failure In some case, we may create sched port dynamically, if err when creating so memory will leak. Fixes: `de3cfa2c98` ("sched: initial import") Cc: stable@dpdk.org Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>	2018-12-22 00:22:57 +01:00
Reshma Pattan	5d3f721009	mbuf: implement generic format for sched field This patch implements the changes proposed in the deprecation notes [1][2]. librte_mbuf changes: The mbuf->hash.sched field is updated to support generic definition in line with the ethdev traffic manager and meter APIs. The new generic format contains: queue ID, traffic class, color. Added public APIs to set and get these new fields to and from mbuf. librte_sched changes: In addtion, following API functions of the sched library have been modified with an additional parameter of type struct rte_sched_port to accommodate the changes made to mbuf sched field. (i)rte_sched_port_pkt_write() (ii) rte_sched_port_pkt_read_tree_path() librte_pipeline, qos_sched UT, qos_sched app are updated to make use of new changes. Also mbuf->hash.txadapter has been added for eventdev txq, rte_event_eth_tx_adapter_txq_set and rte_event_eth_tx_adapter_txq_get() are updated to use mbuf->hash.txadapter.txq. doc: Release notes updated. Removed deprecation notice for mbuf->hash.sched and sched API. [1] http://mails.dpdk.org/archives/dev/2018-February/090651.html [2] https://mails.dpdk.org/archives/dev/2018-November/119051.html Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>	2018-12-22 00:22:44 +01:00
Reshma Pattan	c712b01326	meter: unify packet color definition Added new rte_color definition in librte_meter to consolidate color definition which is currently replicated in various places such as rte_meter.h, rte_tm.h and rte_mtr.h Created aliases for rte_tm_color, rte_mtr_color and rte_meter_color to use new rte_color values. The definitions of rte_tm_color, rte_mtr_color and rte_meter_color will be deprecated in future. Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-12-20 19:00:10 +01:00
Bruce Richardson	fff6df7bf5	telemetry: fix using ports of different types Different NIC ports can have different numbers of xstats on them, which means that we can't just use the xstats list from the first port registered in the telemetry library. Instead, we need to check the type of each port - by checking its ops structure pointer - and register each port type once with the metrics lib. Fixes: `fdbdb3f9ce` ("telemetry: add initial connection socket") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Kevin Laatz <kevin.laatz@intel.com>	2018-12-22 03:23:06 +01:00
Maxime Coquelin	b473ec1131	vhost: batch used descs chains write-back with packed ring Instead of writing back descriptors chains in order, let's write the first chain flags last in order to improve batching. Also, move the write barrier in logging cache sync, so that it is done only when logging is enabled. It means there is now one more barrier for split ring when logging is enabled. With Kernel's pktgen benchmark, ~3% performance gain is measured. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-12-21 16:22:41 +01:00
Maxime Coquelin	815814c4ff	vhost: remove useless prefetch for packed ring descriptor This prefetch does not show any performance improvement. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com>	2018-12-21 16:22:41 +01:00
Maxime Coquelin	aaf8979d6f	vhost: prefetch descriptor after the read barrier This patch moves the prefetch after the available index is read to avoid prefetching a descriptor not available yet. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com>	2018-12-21 16:22:41 +01:00
Maxime Coquelin	33e12d63d1	vhost: enforce desc flags and content read ordering A read barrier is required to ensure that the ordering between descriptor's flags and content reads is enforced. 1. read flags = desc->flags if (flags & AVAIL_BIT) 2. read desc->id There is a control dependency between steps 1 and step 2. 2 could be speculatively executed before 1, which could result in 'id' to not be updated yet. Fixes: `2f3225a7d6` ("vhost: add vector filling support for packed ring") Cc: stable@dpdk.org Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com>	2018-12-21 16:22:41 +01:00
Maxime Coquelin	d4ff2135eb	vhost: enforce avail index and desc read ordering A read barrier is required to ensure the ordering between available index and the descriptor reads is enforced. 1. read avail_head = avail->idx 2. read cur_idx = last_avail_idx if (cur_idx != avail_head) { 3. read idx = avail->ring[cur_idx] 4. read desc[idx] } There is a control dependency between step 1 and steps 3 & 4, 3 could be speculatively executed before 1, which could result in 'idx' to not being updated yet. Fixes: `4796ad63ba` ("examples/vhost: import userspace vhost application") Cc: stable@dpdk.org Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com>	2018-12-21 16:22:41 +01:00
Bruce Richardson	8743d499a5	net: fix underflow for checksum of invalid IPv4 packets If we receive a packet with an invalid IP header, where the total packet length is reported as less than the IP header length, we would end up getting an underflow in the length subtraction. This could cause us to checksum e.g. 4GB of data in the case where the result of the subtraction was -1. We fix this by having the function return 0 - an invalid sum - when the length is less than the header length. Fixes: `af75078fec` ("first public release") Fixes: `6006818cfb` ("net: new checksum functions") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-12-21 16:22:41 +01:00
Xiao Wang	b13ad2decc	vhost: provide helpers for virtio ring relay This patch provides two helpers for vdpa device driver to perform a relay between the guest virtio ring and a mediated virtio ring. The available ring relay will synchronize the available entries, and help to do desc validity checking. The used ring relay will synchronize the used entries from mediated ring to guest ring, and help to do dirty page logging for live migration. The later patch will leverage these two helpers. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-21 16:22:40 +01:00
Xiao Wang	43f34e3566	vhost: provide helper for host notifier ctrl VDPA driver can decide if it needs to enable/disable the host notifier mapping, so exposing a API can allow flexibility. A later patch will base on this. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-21 16:22:40 +01:00
Xiao Wang	02e3b285d4	vhost: remove unused function vhost_detach_vdpa_device() is internally defined but not used, remove it in this patch. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-21 16:22:40 +01:00
Matthias Gatto	276d63505b	vhost: fix race condition when adding fd in the fdset fdset_add can call fdset_shrink_nolock which call fdset_move concurrently to poll that is call in fdset_event_dispatch. This patch add a mutex to protect poll from been call at the same time fdset_add call fdset_shrink_nolock. Fixes: `1b815b8959` ("vhost: try to shrink pfdset when fdset_add fails") Cc: stable@dpdk.org Signed-off-by: Matthias Gatto <matthias.gatto@outscale.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-21 16:22:40 +01:00
Anatoly Burakov	ba731ea1dd	malloc: fix deadlock when reading stats Currently, malloc statistics and external heap creation code use memory hotplug lock as a way to synchronize accesses to heaps (as in, locking the hotplug lock to prevent list of heaps from changing under our feet). At the same time, malloc statistics code will also lock the heap because it needs to access heap data and does not want any other thread to allocate anything from that heap. In such scheme, it is possible to enter a deadlock with the following sequence of events: thread 1 thread 2 rte_malloc() rte_malloc_dump_stats() take heap lock take hotplug lock failed to allocate, attempt to take hotplug lock attempt to take heap lock Neither thread will be able to continue, as both of them are waiting for the other one to drop the lock. Adding an additional lock will require an ABI change, so instead of that, make malloc statistics calls thread-unsafe with respect to creating/destroying heaps. Fixes: `72cf92b318` ("malloc: index heaps using heap ID rather than NUMA node") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 15:26:43 +01:00
Honnappa Nagarahalli	d5c677db89	hash: fix out-of-bound write while freeing key slot Add a debug check for out-of-bound write while freeing the key slot. Coverity issue: 325733 Fixes: `e605a1d36c` ("hash: add lock-free r/w concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-12-21 01:53:33 +01:00
Jeff Shaw	0f48ca429b	hash: fix return of bulk lookup The __rte_hash_lookup_bulk() function returns void, and therefore should not return with an expression. This commit fixes the following compiler warning when attempting to compile with "-pedantic -std=c11". warning: ISO C forbids ‘return’ with expression, in function returning void [-Wpedantic] Fixes: `9eca8bd7a6` ("hash: separate lock-free and r/w lock lookup") Cc: stable@dpdk.org Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-12-21 01:41:18 +01:00
Liang Ma	e6c6dc0f96	power: add p-state driver compatibility Previously, in order to use the power library, it was necessary for the user to disable the intel_pstate driver by adding “intel_pstate=disable” to the kernel command line for the system, which causes the acpi_cpufreq driver to be loaded in its place. This patch adds the ability for the power library use the intel-pstate driver. It adds a new suite of functions behind the current power library API, and will seamlessly set up the user facing API function pointers to the relevant functions depending on whether the system is running with acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest, using kvm. The library API and ABI is unchanged. Signed-off-by: Liang Ma <liang.j.ma@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2018-12-21 01:33:59 +01:00
Qi Zhang	85d6815fa6	eal: close multi-process socket during cleanup When secondary process quit, the mp_socket* file still exist, that cause rte_mp_request_sync fail when try to send message on a floating socket. The patch fix the issue by introduce a function rte_mp_channel_cleanup. This function will be called by rte_eal_cleanup and it will close the mp socket and delete the mp_socket* file. Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>	2018-12-21 01:15:41 +01:00
Anatoly Burakov	9d65053761	eal: add 64-bit log2 function Add missing implementation for 64-bit log2 function, and extend the unit test to test this new function. Also, remove duplicate reimplementation of this function from testpmd and memalloc. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:23:49 +01:00
Anatoly Burakov	43c9e6c205	eal: add 64-bit fls function Add missing implementation for 64-bit fls function, and extend unit test to test the new function as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:17:43 +01:00
Anatoly Burakov	4e261f5519	eal: add 64-bit bsf and 32-bit safe bsf functions Add an rte_bsf64 function that follows the convention of existing rte_bsf32 function. Also, add missing implementation for safe version of rte_bsf32, and implement unit tests for all recently added bsf varieties. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:00:58 +01:00
Anatoly Burakov	cc7ddb00da	bitmap: remove deprecated 64-bit bsf function The function rte_bsf64 was deprecated in a previous release, so remove the function, and the deprecation notice associated with it. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 23:44:56 +01:00
Anatoly Burakov	307315d457	eal: fix runtime directory cleanup in noshconf mode When using --no-shconf or --in-memory modes, there is no runtime directory to be created, so there is no point in attempting to clean it. Fixes: `0a529578f1` ("eal: clean up unused files on initialization") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 23:27:35 +01:00
Anatoly Burakov	c75f535ac5	mem: use memfd for no-huge mode When running in no-huge mode, we anonymously allocate our memory. While this works for regular NICs and vdev's, it's not suitable for memory sharing scenarios such as virtio with vhost_user backend. To fix this, allocate no-huge memory using memfd, and register it with memalloc just like any other memseg fd. This will enable using rte_memseg_get_fd() API with --no-huge EAL flag. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:58:25 +01:00
Anatoly Burakov	df7722c75b	mem: allow setting up segment list fd Currently, only segment fd's for multi-file segments are supported, while for memfd-backed no-huge memory we need single-file segments mode. Add support for single-file segments in the internal API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:55:56 +01:00
Anatoly Burakov	d75eea3145	mem: check for memfd support in segment fd API If memfd support was not compiled, or hugepage memfd support is not available at runtime, the API will now return proper error code, indicating that this API is unsupported. This changes the API, so document the changes. Fixes: `41dbdb6872` ("mem: add external API to retrieve page fd") Fixes: `3a44687139` ("mem: allow querying offset into segment fd") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:54:37 +01:00
Anatoly Burakov	525670756a	mem: fix segment fd API error code for external segment Segment fd API does not support getting segment fd's from externally allocated memory, so return proper error code on any attempts to do so. This changes API behavior, so document the change as well. Fixes: `5282bb1c36` ("mem: allow memseg lists to be marked as external") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:51:49 +01:00
Anatoly Burakov	bed7941886	mem: allow usage of non-heap external memory in multiprocess Add multiprocess support for externally allocated memory areas that are not added to DPDK heap (and add relevant doc sections). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Anatoly Burakov	950e8fb4e1	mem: allow registering external memory areas The general use-case of using external memory is well covered by existing external memory API's. However, certain use cases require manual management of externally allocated memory areas, so this memory should not be added to the heap. It should, however, be added to DPDK's internal structures, so that API's like ``rte_virt2memseg`` would work on such external memory segments. This commit adds such an API to DPDK. The new functions will allow to register and unregister externally allocated memory areas, as well as documentation for them. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Anatoly Burakov	39ff94e71c	malloc: separate destroying memseg list and heap data Currently, destroying external heap chunk and its memseg list is part of one process. When we will gain the ability to unregister external memory from DPDK that doesn't have any heap structures associated with it, we need to be able to find and destroy memseg lists as well as heap data separately. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:10:08 +01:00
Anatoly Burakov	0f526d674f	malloc: separate creating memseg list and malloc heap Currently, creating external malloc heap involves also creating a memseg list backing that malloc heap. We need to have them as separate functions, to allow creating memseg lists without creating a malloc heap. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:09:55 +01:00
Anatoly Burakov	646e5260ee	malloc: make alignment requirements more stringent The external heaps API already implicitly expects start address of the external memory area to be page-aligned, but it is not enforced or documented. Fix this by implementing additional parameter checks at memory add call, and document the page alignment requirement explicitly. Fixes: `7d75c31014` ("malloc: allow adding memory to named heaps") Cc: stable@dpdk.org Suggested-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 15:34:03 +01:00
Anatoly Burakov	b3e735e16e	malloc: fix duplicate mem event notification We already trigger a mem event notification inside the walk function, no need to do it twice. Fixes: `f32c7c9de9` ("malloc: enable event callbacks for external memory") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:28:55 +01:00
Seth Howell	fba0ca2274	malloc: notify primary process about hotplug in secondary When secondary process hotplugs memory, it sends a request to primary, which then performs the real mmap() and sends sync requests to all secondary processes. Upon receiving such sync request, each secondary process will notify the upper layers of hotplugged memory (and will call all locally registered event callbacks). In the end we'll end up with memory event callbacks fired in all the processes except the primary, which is a bug. This gets critical if memory is hotplugged while a VFIO device is attached, as the VFIO memory registration - which is done from a memory event callback present in the primary process only - is never called. After this patch, a primary process fires memory event callbacks before secondary processes start their synchronizations - both for hotplug and hotremove. Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Cc: stable@dpdk.org Signed-off-by: Seth Howell <seth.howell@intel.com> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:25:34 +01:00
Yongseok Koh	6d09256148	malloc: fix finding maximum contiguous IOVA size malloc_elem_find_max_iova_contig() could return invalid size due to a missing sanity check. The following gdb output shows how 'cur_size' can be invalid in find_biggest_element(). (gdb) p/x cur_size $4 = 0xffffffffffe42900 (gdb) p elem $1 = (struct malloc_elem ) 0x12e842000 (gdb) p elem $2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next = 0x12e842900, free_list = {le_next = 0x109538000, le_prev = 0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE, pad = 0, size = 2304} (gdb) p *elem->msl $5 = {{base_va = 0x100200000, addr_64 = 4297064448}, page_sz = 2097152, socket_id = 0, version = 790, len = 17179869184, external = 0, memseg_arr = {name = "memseg-2048k-0-0", '\000' <repeats 47 times>, count = 493, len = 8192, elt_sz = 48, data = 0x10002e000, rwlock = {cnt = 0}}} Fixes: `9fe6bceafd` ("malloc: add finding biggest free IOVA-contiguous element") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:17:48 +01:00
Jim Harris	476c847ab6	malloc: add option --match-allocations SPDK uses the rte_mem_event_callback_register API to create RDMA memory regions (MRs) for newly allocated regions of memory. This is used in both the SPDK NVMe-oF target and the NVMe-oF host driver. DPDK creates internal malloc_elem structures for these allocated regions. As users malloc and free memory, DPDK will sometimes merge malloc_elems that originated from different allocations that were notified through the registered mem_event callback routine. This results in subsequent allocations that can span across multiple RDMA MRs. This requires SPDK to check each DPDK buffer to see if it crosses an MR boundary, and if so, would have to add considerable logic and complexity to describe that buffer before it can be accessed by the RNIC. It is somewhat analagous to rte_malloc returning a buffer that is not IOVA-contiguous. As a malloc_elem gets split and some of these elements get freed, it can also result in DPDK sending an RTE_MEM_EVENT_FREE notification for a subset of the original RTE_MEM_EVENT_ALLOC notification. This is also problematic for RDMA memory regions, since unregistering the memory region is all-or-nothing. It is not possible to unregister part of a memory region. To support these types of applications, this patch adds a new --match-allocations EAL init flag. When this flag is specified, malloc elements from different hugepage allocations will never be merged. Memory will also only be freed back to the system (with the requisite memory event callback) exactly as it was originally allocated. Since part of this patch is extending the size of struct malloc_elem, we also fix up the malloc autotests so they do not assume its size exactly fits in one cacheline. Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 13:01:08 +01:00
Gao Feng	cc80353223	memzone: fix unlock on initialization failure The RTE_PROC_PRIMARY error handler lost the unlock statement in the current codes. Now unlock and return in one place to fix it. Fixes: `49df3db848` ("memzone: replace memzone array with fbarray") Cc: stable@dpdk.org Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 12:24:14 +01:00
Gao Feng	32fa7f8913	eal: check peer allocation in multi-process request Add the check for null peer pointer like the bundle pointer in the mp request handler. They should follow same style. And add some logs for nomem cases. Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 00:01:28 +01:00
Gao Feng	e14bc93e8f	eal: fix leak on multi-process request error When rte_eal_alarm_set failed, need to free the bundle mem in the error handler of handle_primary_request and handle_secondary_request. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Cc: stable@dpdk.org Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 00:01:28 +01:00
Gaetan Rivet	c9b413c3b1	eal: fix detection of duplicate option register Missing brackets around the if means that the loop will end at its first iteration. Fixes: `2395332798` ("eal: add option register infrastructure") Cc: stable@dpdk.org Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2018-12-20 00:01:28 +01:00
Keith Wiles	e3b090f3da	eal: fix missing newline in a log Add a missing newline to a RTE_LOG message. Fixes: `2395332798` ("eal: add option register infrastructure") Cc: stable@dpdk.org Signed-off-by: Keith Wiles <keith.wiles@intel.com>	2018-12-20 00:01:28 +01:00
Chas Williams	7a838c8798	ip_frag: fix IPv6 when MTU sizes not aligned to 8 bytes The same issue was fixed on for the ipv4 version of this routine in commit `8d4d3a4f73` ("ip_frag: handle MTU sizes not aligned to 8 bytes"). Briefly, the size of an ipv6 header is always 40 bytes. With an MTU of 1500, this will never produce a multiple of 8 bytes for the frag_size and this routine can never succeed. Since RTE_ASSERTS are disabled by default, this failure is typically ignored. To fix this, round down to the nearest 8 bytes and use this when producing the fragments. Fixes: `0aa31d7a59` ("ip_frag: add IPv6 fragmentation support") Cc: stable@dpdk.org Signed-off-by: Chas Williams <chas3@att.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-12-19 22:40:08 +01:00
Konstantin Ananyev	d5b46fc363	rwlock: introduce try semantics Introduce rte_rwlock_read_trylock() and rte_rwlock_write_trylock(). Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com>	2018-12-19 20:56:11 +01:00
Erik Gabriel Carrillo	7079e29f7f	timer: fix race condition rte_timer_manage() adds expired timers to a "run list", and walks the list, transitioning each timer from the PENDING to the RUNNING state. If another lcore resets or stops the timer at precisely this moment, the timer state would instead be set to CONFIG by that other lcore, which would cause timer_manage() to skip over it. This is expected behavior. However, if a timer expires quickly enough, there exists the following race condition that causes the timer_manage() routine to misinterpret a timer in CONFIG state, resulting in lost timers: - Thread A: - starts a timer with rte_timer_reset() - the timer is moved to CONFIG state - the spinlock associated with the appropriate skiplist is acquired - timer is inserted into the skiplist - the spinlock is released - Thread B: - executes rte_timer_manage() - find above timer as expired, add it to run list - walk run list, see above timer still in CONFIG state, unlink it from run list and continue on - Thread A: - move timer to PENDING state - return from rte_timer_reset() - timer is now in PENDING state, but not actually linked into a pending list or a run list and will never get processed further by rte_timer_manage() This commit fixes this race condition by only releasing the spinlock after the timer state has been transitioned from CONFIG to PENDING, which prevents rte_timer_manage() from seeing an incorrect state. Fixes: `9b15ba895b` ("timer: use a skip list") Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com>	2018-12-19 20:56:09 +01:00
Amr Mokhtar	56b878b0ba	bbdev: add missing experimental tags and map entries - add missing APIs to map file - add experimental tag to all bbdev APIs Signed-off-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-12-19 19:36:53 +01:00
Kamil Chalupnik	0b98d574e3	bbdev: enhance throughput test Improvements added to throughput test: - test is run in loop (number of iterations is specified by TEST_REPETITIONS define) which ensures more accurate results - length of input data is calculated based on amount of CBs in TB - maximum number of decoding iterations is gathered from results - added new functions responsible for printing results - small fixes for memory management Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-12-19 11:19:10 +01:00
Kamil Chalupnik	9fa6ebde8e	bbdev: enhance offload cost test Offload cost test was improved in order to collect more accurate results. Signed-off-by: Kamil Chalupnik <kamilx.chalupnik@intel.com> Acked-by: Amr Mokhtar <amr.mokhtar@intel.com>	2018-12-19 11:19:10 +01:00
Lee Daly	9d3e1cb135	compressdev: fix structure comment Fixes incorrect comment on compressdev rte_comp_op structure element. Comment needed to be updated to be compliant with the use of chained mbufs. Fixes: `f87bdc1ddc` ("compressdev: add compression specific data") Cc: stable@dpdk.org Signed-off-by: Lee Daly <lee.daly@intel.com> Acked-by: Fiona Trahe <fiona.trahe@intel.com>	2018-12-19 11:19:10 +01:00
Fiona Trahe	5eb0d610a5	compressdev: add bulk free operation API There's an API to bulk allocate operations, this adds a corresponding bulk free API. Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Shally Verma <shally.verma@caviumnetworks.com> Acked-by: Lee Daly <lee.daly@intel.com>	2018-12-19 11:19:10 +01:00
Nikhil Rao	5bd4ae2d77	eventdev: fix eth Tx adapter queue count checks rte_event_eth_tx_adapter_queue_add() - add a check that returns an error if the ethdev has zero Tx queues configured. rte_event_eth_tx_adapter_queue_del() - remove the checks for ethdev queue count, instead check for queues added to the adapter which maybe different from the current ethdev queue count. Fixes: `a3bbf2e097` ("eventdev: add eth Tx adapter implementation") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>	2018-12-17 20:25:10 +01:00
Gage Eads	1f7a110269	eventdev: fix xstats documentation typo The eventdev extended stats documentation referred to two non-existent functions, rte_eventdev_xstats_get and rte_eventdev_get_xstats_by_name. Fixes: `3ed7fc039a` ("eventdev: add extended stats") Cc: stable@dpdk.org Signed-off-by: Gage Eads <gage.eads@intel.com>	2018-12-16 18:28:07 +01:00
Erik Gabriel Carrillo	ac0fc54a49	eventdev: remove redundant timer adapter function prototypes Fixes: `6750b21bd6` ("eventdev: add default software timer adapter") Cc: stable@dpdk.org Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-12-16 17:22:14 +01:00
Nikhil Rao	91c1667da0	eventdev: fix error log in eth Rx adapter strerror() input parameter should be > 0. Coverity issue: 302864 Fixes: `3810ae4357` ("eventdev: add interrupt driven queues to Rx adapter") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-12-16 17:22:14 +01:00
Jiayu Hu	f8a05885e7	gro: fix overflow of payload length calculation When the packet length is smaller than the header length, the calculated payload length will be overflowed and result in incorrect reassembly behaviors. Fixes: `1e4cf4d6d4` ("gro: cleanup") Fixes: `9e0b9d2ec0` ("gro: support VxLAN GRO") Cc: stable@dpdk.org Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>	2018-12-19 04:29:57 +01:00
Anatoly Burakov	0a529578f1	eal: clean up unused files on initialization When creating process data structures, EAL will create many files in EAL runtime directory. Because we allow multiple secondary processes to run, each secondary process gets their own unique file. With many secondary processes running and exiting on the system, runtime directory will, over time, create enormous amounts of sockets, fbarray files and other stuff that just sits there unused because the process that allocated it has died a long time ago. This may lead to exhaustion of disk (or RAM) space in the runtime directory. Fix this by removing every unlocked file at initialization that matches either socket or fbarray naming convention. We cannot be sure of any other files, so we'll leave them alone. Also, remove similar code from mp socket code. We do it at the end of init, rather than at the beginning, because secondary process will use primary process' data structures even if the primary itself has died, and we don't want to remove those before we lock them. Bugzilla ID: 106 Cc: stable@dpdk.org Reported-by: Vipin Varghese <vipin.varghese@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-19 04:12:30 +01:00
David Marchand	a8499f65a1	log: add missing experimental tag When rte_log_register_type_and_pick_level() has been introduced, it has been correctly added to the EXPERIMENTAL section of the eal map and the symbol itself has been marked at its definition. However, the declaration of this symbol in rte_log.h is missing the __rte_experimental tag. Because of this, a user can try to call this symbol without being aware this is an experimental api (neither compilation nor link warning). Fixes: `b22e77c026` ("eal: register log type and pick level from args") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2018-12-19 02:30:02 +01:00
Jeff Shaw	68687daff2	eal: remove unnecessary dirent.h include Prior to this patch, the two affected .c files include <dirent.h> unnecessarily. This commit removes the include lines. Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-12-19 01:29:36 +01:00
Tiwei Bie	e9436f54af	pdump: remove deprecated APIs We already changed to use generic IPC in pdump since below commit: commit `660098d61f` ("pdump: use generic multi-process channel") The `rte_pdump_set_socket_dir()`, the `path` parameter of `rte_pdump_init()` and the `enum rte_pdump_socktype` have been deprecated since then. This commit removes these deprecated APIs and also bumps the pdump ABI. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com>	2018-12-19 01:25:56 +01:00
Ilya Maximets	48cae0bfa6	vhost: fix double read of descriptor flags Flags could be updated in a separate process leading to the inconsistent check. Additionally, read marked as 'volatile' to highlight the shared nature of the variable and avoid such issues in the future. Fixes: `d3211c98c4` ("vhost: add helpers for packed virtqueues") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-13 18:17:42 +00:00
Maxime Coquelin	cf14478d77	vhost: fix crash after mmap failure If mmap() call fails in vhost_user_set_mem_table, dev->mem is set to NULL. If later, qva_to_vva() is called, a segfault occurs. Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Reviewed-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com>	2018-12-13 17:56:21 +00:00
Yaroslav Brustinov	b4b896fcfe	ethdev: fix typo in queue setup error log '=' should be '>=" for '[rt]x_desc_lim.nb_min' check. Fixes: `386c993e95` ("ethdev: add a missing sanity check for Tx queue setup") Fixes: `80a1deb4c7` ("ethdev: add API to retrieve queue information") Cc: stable@dpdk.org Signed-off-by: Yaroslav Brustinov <ybrustin@cisco.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-12-13 17:45:59 +00:00
Thomas Monjalon	37d800031d	version: 19.02-rc0 Start version numbering for a new release cycle, and introduce a template file for release notes. The release notes comments are updated to mandate a scope label for API and ABI changes. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-30 16:20:33 +00:00
Thomas Monjalon	0da7f445df	version: 18.11.0 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-27 00:36:00 +01:00
Thomas Monjalon	c5f21bdae4	fix indentation in symbol maps Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Allain Legacy <allain.legacy@windriver.com>	2018-11-26 20:16:46 +01:00
Anatoly Burakov	e45088b1e1	mem: fix division by zero in no-NUMA mode When RTE_EAL_NUMA_AWARE_HUGEPAGES is set to "n", not all memtypes will be valid, because we skip some due to not supporting other NUMA nodes, leading to a division by zero error down the line because the necessary memtype fields weren't populated. Fix it by limiting number of memtypes to number of memtypes we have actually created. Fixes: `1dd342d0fd` ("mem: improve segment list preallocation") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2018-11-26 15:35:46 +01:00
Thomas Monjalon	6cff3183c2	version: 18.11-rc5 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-25 21:19:19 +01:00
Darek Stojaczyk	161419983d	eal: fix devargs reference after probing failure Even if a device failed to plug, it's still a device object that references the devargs. Those devargs will be freed automatically together with the device, but freeing them any earlier - like it's done in the hotplug error handling path right now - will give us a dangling pointer and a segfault scenario. Consider the following case: * secondary process receives the hotplug request IPC message * devargs are either created or updated * the bus is scanned * a new device object is created with the latest devargs * the device can't be plugged for whatever reason, bus->plug returns error * the devargs are freed, even though they're still referenced by the device object on the bus For PCI devices, the generic device name comes from a buffer within the devargs. Freeing those will make EAL segfault whenever the device name is checked. This patch just prevents the hotplug error handling path from removing the devargs when there's a device that references them. This is done by simply exiting early from the hotplug function. As mentioned in the beginning, those devargs will be freed later, together with the device itself. Fixes: `7e8b266501` ("eal: fix hotplug add / remove") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-25 13:45:35 +01:00
Darek Stojaczyk	29bf7e93ba	eal: fix devargs leak on multi-process detach request Device detach triggered through IPC leaked some memory. It allocated a devargs objects just to use it for parsing the devargs string in order to retrieve the device name. Those devargs weren't passed anywhere and were never freed. First of all, let's put those devargs on the stack, so they doesn't need to be freed. Then free the additional arguments string as soon as it's allocated, because we won't need it. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-25 13:32:01 +01:00
Darek Stojaczyk	494db286f3	eal: fix multi-process hotplug if attached in secondary Consider the following scenario: 1) primary process (A) starts, probes the bus 2) a secondary process (B) starts, probes the bus 3) yet another secondary process (C) starts 4) (C) registers the pci driver and hotplugs the device * an IPC attach req is sent to the primary (A) * (A) ignores the -EEXIST from process-local probe * (A) propagates the request to all secondary processes * (B) responds with -EEXIST * (A) replies to the original request with the -EEXIST return code * the -EEXIST is returned back to the user, although the device was successfully attached both locally and in all other processes This patch makes the primary process reply with rc=0 even if there was another secondary process with the device already attached. The primary process already didn't reply with -EEXIST when the device was attached locally, so now this behavior is even more consistent. Looking by the code, this seems to be the originally intended behavior. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-25 13:27:17 +01:00
Darek Stojaczyk	d27eed3139	eal: fix multi-process hotplug if already probed When primary process receives an IPC attach request of a device that's already locally-attached, it doesn't setup its variables properly and is prone to segfaulting on a subsequent rollback. `ret = local_dev_probe(req->devargs, &dev)` The above function will set `dev` pointer to the proper device unless it returns with error. One of those errors is -EEXIST, which the hotplug function explicitly ignores. For -EEXIST, it proceeds with attaching the device and expects the dev pointer to be valid. This patch makes `local_dev_probe` set the dev pointer even if it returns -EEXIST. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2018-11-25 13:22:51 +01:00
Darek Stojaczyk	5d36bf2bcd	eal: fix multi-process hotplug rollback If a device fails to attach before it's plugged, the subsequent rollback will still try to detach it, causing a segfault. Unplugging a device that wasn't plugged isn't really supported, so this patch adds an extra error check to prevent that from happening. While here, fix this also for normal (non-rollback) detach, which could also theoretically segfault on non-plugged device. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2018-11-25 13:15:34 +01:00
Ilya Maximets	9e8b90fc6d	eal/bsd: fix possible IOPL fd leak If rte_eal_iopl_init() will be called more than once we'll leak the file descriptor. Fixes: `b46fe31862` ("eal/bsd: fix virtio on FreeBSD") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-25 11:44:25 +01:00
Maxime Coquelin	5a12b67e74	vhost: fix packed ring constants declaration The packed ring defines were declared only if kernel header does not declare them. The problem is that they are not applied in upstream kernel, and some changes in the names have been required. This patch declares the defines unconditionally, which fixes potential build issues. Fixes: `297b1e7350` ("vhost: add virtio packed virtqueue defines") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-22 23:06:26 +01:00
Ferruh Yigit	8461a5bb70	ethdev: remove unused deferred device state DEFERRED state replaced by ownership concept and it is no more used as code comment states. ethdev ABI broken on this release use this opportunity to remove DEFERRED state. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Matan Azrad <matan@mellanox.com>	2018-11-21 16:11:14 +01:00
Akhil Goyal	f63ffee26f	security: restore experimental tag for unimplemented APIs Following APIs are not currently implemented by any of the drivers, so marking them as rte_experimental again. Fixes: `1a81dce780` ("security: remove experimental tag") rte_security_get_userdata; rte_security_session_stats_get; rte_security_session_update; Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>	2018-11-23 02:03:33 +01:00
Nikhil Rao	e846cfdec3	eventdev: fix unlock in Rx adapter In the eth Rx adapter SW service function, move the return to after the spinlock unlock. Coverity issue: 302857 Fixes: `a66a837446` ("eventdev: fix Rx SW adapter stop") Cc: stable@dpdk.org Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-23 02:03:33 +01:00
Thomas Monjalon	6b8d9a4b4c	eventdev: fix possible uninitialized variable When compiling with -O1, this error can appear: lib/librte_eventdev/rte_event_eth_tx_adapter.c:705:6: error: ‘ret’ may be used uninitialized in this function If tx_queue_id is -1 and nb_queues is 0, then ret is returned without being initialized. It is fixed by setting 0 as initial value. Fixes: `a3bbf2e097` ("eventdev: add eth Tx adapter implementation") Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:42 +01:00
Thomas Monjalon	a17842c142	kni: fix possible uninitialized variable This error can be raised: lib/librte_kni/rte_kni.c:531:15: error: 'req' may be used uninitialized in this function It should not happen because kni_fifo_get() would return 0 if req is not initialized, so the function would return before using req. But GCC complains about it in -O1 optimization, and a NULL initialization is harmless here. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:35 +01:00
Thomas Monjalon	e357e8ebd9	eal: fix build with -O1 In case of optimized compilation, RTE_BUILD_BUG_ON use an external variable which is neither defined, nor used. It seems not optimized out in case of OPDL compiled with clang -O1: opdl_ring.c: undefined reference to `RTE_BUILD_BUG_ON_detected_error' clang-6.0: fatal error: linker command failed with exit code 1 Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:32 +01:00
Anatoly Burakov	509cc88513	eal: deprecate and rename bsf64 function Rename rte_bsf64 to rte_bsf64_safe (this is a "safe" version in that it prevents undefined behavior by checking if incoming parameter is zero) and move it to common header. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:31 +01:00
Anatoly Burakov	816c924e9e	eal: remove useless code in bsf64 function RTE_BITMAP_OPTIMIZATIONS was never set to 0 and makes no sense anyway, so remove all code related to it. Also, drop the "likely" for bsf64 code, because it's a generic function and we cannot make any assumptions about likely values of incoming arguments. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-11-23 01:43:26 +01:00
Anatoly Burakov	615fcf55d2	ipc: fix access after async request failure Previous fix for rte_panic has moved setting of alarm before sending the message. This means that whether we send a message, the alarm would still trigger. The comment noted that cleanup would happen in the alarm handler, but that's not what actually happened - instead, in the event of failed send we freed the memory in-place, before putting the request on the queue. This works OK when the message is sent, but when sending the message fails, the alarm would still trigger with a pointer argument that points to non-existent memory, and cause memory corruption. There probably is a "proper" fix for this issue, with correct handling of sent vs. unsent requests, however it would be simpler just to sacrifice the sent request in the (extremely unlikely) event of alarm set failing. The other process would still send a response, but it will be ignored by the sender. Fixes: `45e5f49e87` ("ipc: remove panic in async request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-23 01:43:24 +01:00
Thomas Monjalon	d82e5db6f6	version: 18.11-rc4 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-19 01:40:54 +01:00
Akhil Goyal	1a81dce780	security: remove experimental tag rte_security has been experimental since DPDK 17.11 release. Now the library has matured and expermental tag is removed in this patch. Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com> Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Boris Pismenny <borisp@mellanox.com>	2018-11-18 22:31:30 +01:00
Jeff Guo	c48407e8af	eal: fix deadlock in hot-unplug When device be hot-unplugged, the hot-unplug handler will be invoked by uio remove event and the device will be detached, then kernel will sent another pci remove event. So if there is any unlock miss, it will cause a dead lock issue. This patch will add this missing unlock for hot-unplug handler. Fixes: `0fc54536b1` ("eal: add failure handling for hot-unplug") Signed-off-by: Jeff Guo <jia.guo@intel.com>	2018-11-18 17:16:40 +01:00
Chaitanya Babu Talluri	f493119397	efd: fix write unlock during ring creation In rte_efd_create() write lock has already been unlocked before ring creation itself. So second unlock after the ring creation has been removed. Fixes: `56b6ef874f` ("efd: new Elastic Flow Distributor library") Cc: stable@dpdk.org Signed-off-by: Chaitanya Babu Talluri <tallurix.chaitanya.babu@intel.com> Acked-by: Reshma Pattan <reshma.pattan@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-18 15:46:02 +01:00
David Wilder	6b062d56bc	mem: fix anonymous mapping on Power9 Removed the use of MAP_HUGETLB for anonymous mapping on ppc64. The MAP_HUGETLB had previously been added to workaround issues on IBM Power8 systems when mapping /dev/zero. In the current code the MAP_HUGETLB flag will cause the anonymous mapping to fail on Power9. Note, Power8 is currently failing to correctly mmap Hugepages, with and without this change. Fixes: `284ae3e9ff` ("eal/ppc: fix mmap for memory initialization") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeep@us.ibm.com>	2018-11-18 14:42:18 +01:00
Anatoly Burakov	71aae4b421	malloc: fix adjacency check to also include segment list It may so happen that two memory locations may be adjacent in virtual memory, but belong to different segment lists. With current code, such segments will be concatenated. Fix the adjacency checking code to also check if the adjacent malloc elements belong to the same memseg list. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 14:15:04 +01:00
Anatoly Burakov	32fc0fa00e	mem: check for contiguousness in external segments For IOVA as VA mode, we assume that memory is contiguous. However, for external segments that assumption may not necessarily hold. Fix the code to not assume that external memory segments are contiguous even in IOVA as VA mode. Fixes: `5282bb1c36` ("mem: allow memseg lists to be marked as external") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 14:12:20 +01:00
Kevin Laatz	2ddd89c3c6	eal: fix duplicate function declaration The rte_eal_get_runtime_dir() function is currently being declared in two header files. This API was made public in commit `6911c9fd8f` ("eal: export function to get runtime directory"), adding it to rte_eal.h. To make it public, the 'rte' prefix was added to the function so it needed to be modified in the original location of the declaration, eal_filesystem.h. By only modifying, and not removing the decalration, it is now a duplicate. This patch removes the declaration from eal_filesystem.h. Fixes: `6911c9fd8f` ("eal: export function to get runtime directory") Reported-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 13:40:26 +01:00
Thomas Monjalon	3e42b6ce06	version: 18.11-rc3 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-14 05:05:29 +01:00
Fan Zhang	1c25cf4a1c	pipeline: fix logically dead code This patches fixes the coverity issue of logically dead code. Coverity issue: 323523 Fixes: `96303217a6` ("pipeline: add symmetric crypto table action") Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>	2018-11-12 17:45:23 +01:00
Ferruh Yigit	68b931bff2	ethdev: eliminate interim variable `local_conf` variable was needed for offload conversions but no more required. No functional difference, only interim variable eliminated. Fixes: `ab3ce1e0c1` ("ethdev: remove old offload API") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-11-14 00:35:53 +01:00
Wenzhuo Lu	1a411a6fdb	ethdev: fix device info getting The device information cannot be gotten correctly before the configuration is set. Because on some NICs the information has dependence on the configuration. Fixes: `3be82f5cc5` ("ethdev: support PMD-tuned Tx/Rx parameters") Cc: stable@dpdk.org Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-11-14 00:35:53 +01:00
Wenzhuo Lu	aa28ec5d27	ethdev: fix invalid configuration after failure The new configuration is stored during the rte_eth_dev_configure() API but the API may fail. After failure stored configuration will be invalid since it is not fully applied to the device. We better roll the configuration back after failure. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-11-14 00:35:53 +01:00
Tiwei Bie	0541588a44	vhost: remove unneeded null pointer check The caller will guarantee that msg won't be null. Remove the unneeded null pointer check which caused a Coverity warning. Coverity issue: 323484 Fixes: `8f972312b8` ("vhost: support vhost-user") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-14 00:35:53 +01:00
Fan Zhang	cd1e8f03ab	vhost/crypto: fix packet copy in chaining mode This patch fixes the incorrect packet content copy in the chaining mode. Originally the content before cipher offset is overwritten by all zeros. This patch fixes the problem by making sure the correct write back source and destination settings during set up. Fixes: `3bb595ecd6` ("vhost/crypto: add request handler") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-14 00:35:53 +01:00
Tiwei Bie	30affaeebc	vhost: fix IOVA access for packed ring We should apply for RO access when receiving packets from the VM and apply for RW access when sending packets to the VM. Fixes: `a922401f35` ("vhost: add Rx support for packed ring") Fixes: `ae999ce49d` ("vhost: add Tx support for packed ring") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-14 00:35:53 +01:00
Bruce Richardson	f98a95102d	eal/x86: move header to standard BSD license This updates the license on the rte_rtm.h file to be the standard BSD-3-Clause license used for the rest of DPDK, thus bringing the file in compliance with the DPDK licensing policy. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-11-14 01:44:14 +01:00
Bruce Richardson	e5f9a65147	eal/x86: reduce contention when retrying TSX When TSX transactions abort, it is generally worth retrying a number of times before falling back to the traditional locking path, as the parallelism benefits from TSX can be worth it when a transaction does succeed. For cases with multiple threads and high contention rates, it can be useful to have increasing delays between retry attempts, so as to avoid having the same threads repeatedly collided. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-11-14 01:03:21 +01:00
Yipeng Wang	606bd11736	hash: fix TSX aborts with newer gcc gcc 7 and 8 with O3 will generate vzeroupper from rte_memcpy into TSX region which may abort the TSX transaction. This fix changes rte_memcpy to memcpy which will not insert extra vzeroupper into the library. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-11-14 01:02:07 +01:00
Anatoly Burakov	45e5f49e87	ipc: remove panic in async request EAL should not crash when setting alarm fails. Also, remove the profanity in error message. Fixes: `daf9bfca71` ("ipc: remove thread for async requests") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-14 00:01:38 +01:00
Konstantin Ananyev	95df7307a7	bpf: fix x86 JIT for immediate loads x86 jit can generate invalid code for (BPF_LD \| BPF_IMM \| EBPF_DW) instructions, when immediate value is bigger then INT32_MAX. Fixes: `cc752e43e0` ("bpf: add JIT compilation for x86_64 ISA") Cc: stable@dpdk.org Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-11-13 23:18:53 +01:00
Thomas Monjalon	31f19a9beb	pci: fix parsing of address without function number If the last part of the PCI address (function number) is missing, the parsing was successful, assuming function 0. The call to strtoul is not returning an error in such a case, so an explicit check is inserted before. This bug has always been there in older parsing macros: - GET_PCIADDR_FIELD - GET_BLACKLIST_FIELD Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Reported-by: Wisam Jaddo <wisamm@mellanox.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-11-13 17:59:42 +01:00
Honnappa Nagarahalli	9eca8bd7a6	hash: separate lock-free and r/w lock lookup The lock-free algorithm has caused significant lookup performance regression for certain use cases. The regression is attributed to the use of non-relaxed memory orderings. 2 versions of the lookup functions are created. One that uses the RW lock and the one that is lock-free. This restores the performance regression caused for use cases that used RW lock version of the lookup function. Fixes: `e605a1d36` ("hash: add lock-free r/w concurrency") Suggested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-13 17:34:44 +01:00
Gavin Hu	49594a6314	ring/c11: relax ordering for load and store of the head When calling __atomic_compare_exchange_n, use relaxed ordering for the success case, as multiple producers/consumers do not release updates to each other so no need for acquire or release ordering. Because the thread fence in place, ordering for the first iteration can be relaxed. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: * Testing using two physical cores * SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 With the patch: * Testing using two physical cores * SP/SC bulk enq/dequeue (size: 8): 5.59 MP/MC bulk enq/dequeue (size: 8): 10.54 SP/SC bulk enq/dequeue (size: 32): 1.73 MP/MC bulk enq/dequeue (size: 32): 2.38 No significant improvement, nor regression was seen, as the optimisation is not at the critical path. Fixes: `39368ebfc6` ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>	2018-11-13 17:00:58 +01:00
Gavin Hu	86757c2c3e	ring/c11: keep deterministic order allowing retry to work Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) 2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head forward. 3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail beyond the prod.head read by thread 1. 4) Thread 1 is re-scheduled. It reads cons.tail. cpu1(producer) cpu2(producer) cpu3(consumer) load r->prod.head ^ load r->prod.head \| load r->cons.tail \| store r->prod.head(+n) stalled <-- enqueue -----> \| store r->prod.tail(+n) \| load r->cons.head \| load r->prod.tail \| store r->cons.head(+n) \| <...dequeue.....> v store r->cons.tail(+n) load r->cons.tail For thread 1, the __atomic_compare_exchange_n detects the outdated prod.head and retry the flow with the new one. This retry flow works ok on strong ordering platform(eg:x86). But for weak ordering platforms(arm, ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new but cons.tail becomes too old, the retry flow, based on the detection of outdated head, does not trigger as expected, thus the outdate cons.tail causes wrong free_entries. Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries. The fix is to keep the deterministic order of two loads allowing the retry to work. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: * Testing using two physical cores * SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 With the patch: * Testing using two physical cores * SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 The results showed the thread fence degrade the performance slightly, but it is required for correctness. Fixes: `39368ebfc6` ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>	2018-11-13 16:57:58 +01:00
Jerin Jacob	5d08fecdd3	eal: fix build Some toolchain has fls() definition in string.h as argument type int, which is conflicting uint32_t argument type. /export/dpdk.org/lib/librte_eal/common/rte_reciprocal.c:47:19: error: conflicting types for ‘fls’ static inline int fls(uint32_t x) ^~~ /opt/marvell-tools-201/aarch64-marvell-elf/include/strings.h:59:6: note: previous declaration of ‘fls’ was here int fls(int) __pure2; FreeBSD string.h also has fls() with argument as int type. https://www.freebsd.org/cgi/man.cgi?query=fls&sektion=3 Fixing the conflict by using rte version of fls. Fixes: `ffe3ec811e` ("sched: introduce reciprocal divide") Fixes: `faf2b25c9f` ("fm10k: support VMDQ in multi-queue configuration") Cc: stable@dpdk.org Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-12 13:27:02 +01:00
Jerin Jacob	3a6f2c50b9	eal: introduce rte version of fls The function returns the last (most-significant) bit set. Added unit testcase to verify rte_fls_u32(). Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-12 13:25:01 +01:00
Thomas Monjalon	6bdf144553	eal/x86: remove unused memcpy file The use of rte_memcpy_ptr was removed in revert below, but it was missing removing the file arch/x86/rte_memcpy.c. Fixes: `d35cc1fe6a` ("eal/x86: revert select optimized memcpy at run-time") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-12 00:11:46 +01:00
Thomas Monjalon	c7ad7754f8	devargs: do not replace already inserted device The devargs of a device can be replaced by a newly allocated one when trying to probe again the same device (multi-process or multi-ports scenarios). This is breaking some pointer references. It can be avoided by copying the new content, freeing the new devargs, and returning the already inserted pointer. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2018-11-12 00:10:21 +01:00
Alejandro Lucero	ee0e074f81	mem: fix DMA mask width sanity check Current code has different max DMA mask width values for 32 and 64 bits systems. IOMMU hardware could report a higher supported width than current MAX_DMA_MASK_BITS when RTE_ARCH_64 is not defined. This is actually true with a 32 bits kernel running in a 64 bits server with IOMMU hardware. This could also be a problem with embedded systems using an IOMMU designed for 64 bits in a 32 bits system. This patch leaves a single max DMA mask width which will make sure the mask width is within the range for 64 bits variables used for DMA mask. This also will avoid wrong values because any value higher than 64 bits is likely wrong. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-07 14:42:28 +01:00
Anatoly Burakov	4531d096d1	mem: fix use after free in legacy mem init Adding an additional failure path in DMA mask check has exposed an issue where `hugepage` pointer may point to memory that has already been unmapped, but pointer value is still not NULL, so failure handler will attempt to unmap it second time if DMA mask check fails. Fix it by setting `hugepage` pointer to NULL once it is no longer needed. Coverity issue: 325730 Fixes: `165c89b845` ("mem: use DMA mask check for legacy memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-07 00:06:38 +01:00
Thomas Monjalon	c59b06294f	version: 18.11-rc2 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-06 03:27:49 +01:00
Konstantin Ananyev	b8d5dfd4a5	ip_frag: use key length for key comparison Right now reassembly code relies on src_dst[] being all zeroes to determine is it free/occupied entry in the fragments table. This is suboptimal and error prone - user can crash DPDK ip_reassembly app by something like the following scapy script: x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000) frags=fragment(x, fragsize=500) sendp(frags, iface=...) To overcome that issue and reduce overhead of 'key invalidate' and 'key is empty' operations - add key_len into keys comparision procedure. Fixes: `4f1a8f6338` ("ip_frag: add IPv6 reassembly") Cc: stable@dpdk.org Reported-by: Ryan E Hall <ryan.e.hall@intel.com> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-11-06 01:58:11 +01:00
Konstantin Ananyev	7f0983ee33	ip_frag: check fragment length of incoming packet Under some conditions ill-formed fragments might cause reassembly code to corrupt mbufs and/or crash. Let say the following fragments sequence: <ofs=0,len=100, flags=MF> <ofs=96,len=100, flags=MF> <ofs=200,len=0,flags=MF> <ofs=200,len=100,flags=0> can trigger the problem. To overcome such situation, added check that fragment length of incoming value is greater than zero. Fixes: `601e279df0` ("ip_frag: move fragmentation/reassembly headers into a library") Fixes: `4f1a8f6338` ("ip_frag: add IPv6 reassembly") Cc: stable@dpdk.org Reported-by: Ryan E Hall <ryan.e.hall@intel.com> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-11-06 01:58:03 +01:00
Ferruh Yigit	7b178300ac	vhost: fix possible out of bound access Fixes: `d7280c9fff` ("vhost: support selective datapath") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-06 01:14:23 +01:00
Ferruh Yigit	c8b506e4b6	service: fix possible null access Fixes: `21698354c8` ("service: introduce service cores concept") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-11-06 01:14:15 +01:00
Ferruh Yigit	9eb0688412	lib: fix shifting 32-bit signed variable 31 times Fix cppcheck warning by marking variable as unsigned. Fixes: `dc276b5780` ("acl: new library") Fixes: `986ff526fb` ("net: add CRC computation API") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-06 01:14:05 +01:00
Thomas Monjalon	1ccdc31793	ethdev: remove experimental tag for iterator API After removing the function rte_eth_dev_attach(), there are two replacement solutions possible: one using probe event notification, and one using a new iterator. So the application can get the new probed ports either asynchronously or synchronously. The iterator API is new in DPDK 18.11 so they got the experimental tag by policy. It causes an issue for strict applications which do not use experimental functions, and want to use the synchronous method. The replacement for removed API should not be experimental. That's why the experimental status of the ethdev iterator is removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>	2018-11-06 01:14:04 +01:00
Thomas Monjalon	d75d132c30	eal: remove experimental tag for probe/remove The functions rte_dev_probe() and rte_dev_remove() are new in DPDK 18.11 so they got the experimental tag by policy. However they are too much basic functions for being skipped by strict applications which do not use experimental functions. The alternative is to use rte_eal_hotplug_add() and rte_eal_hotplug_remove(), but their API requires the application to parse the devargs string in order to provide bus name, device name and driver arguments. The new function rte_dev_probe() is really simpler to use and more flexible by accepting any devargs string. Let's encourage applications to use it. The old functions rte_eal_hotplug_* may be deprecated later. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>	2018-11-06 01:14:02 +01:00
Anatoly Burakov	1ccfeb7df7	malloc: fix invalid argument handling When adding memory to an external heap, do not go to unlock failure handler because the memory hotplug lock hasn't been taken out yet. Fixes: `7d75c31014` ("malloc: allow adding memory to named heaps") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-06 01:13:58 +01:00
Fan Zhang	d09328567e	vhost/crypto: fix inferred misuse of enum Fix inffered misuse of enum rte_crypto_cipher_algorithm and rte_crypto_auth_algorithm Coverity issue: 277202 Fixes: `e80a987081` ("vhost/crypto: add session message handler") Cc: stable@dpdk.org Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 15:01:25 +01:00
Ferruh Yigit	11745065a5	ethdev: fix redundant function pointer check RTE_FUNC_PTR_OR_ERR_RET() already does the `ethdev_uninit` NULL check. Fixes: `e489007a41` ("ethdev: add generic create/destroy ethdev APIs") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-05 15:01:25 +01:00
Maxime Coquelin	708e14d8b9	vhost: advertize packed ring layout support Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-11-05 15:01:25 +01:00
Maxime Coquelin	2ce8b8973d	vhost: add packed ring support to vring base requests For packed ring layout, we need save avail index and its wrap counter value. At restore time, the used index and its wrap counter are set to available's ones, as the ring procressing is stopped at vring base get time. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-11-05 15:01:25 +01:00
Shahaf Shuler	be685863a9	net: fix build with pedantic The following error popped when compiling with -pedantic: In file included from drivers/net/mlx5/mlx5_flow_dv.c:28:0: include/rte_gre.h:20:2: error: type of bit-field 'res2' is a GCC extension [-Werror=pedantic] uint16_t res2:4; /*< Reserved / Fixing by adding the __extension__ attribute. Fixes: `894f71a380` ("net: add GRE header structure") Cc: stable@dpdk.org Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 15:01:25 +01:00
Gavin Hu	047adc1724	ring/c11: move atomic load of head above the loop In __rte_ring_move_prod_head, move the __atomic_load_n up and out of the do {} while loop as upon failure the old_head will be updated, another load is costly and not necessary. This helps a little on the latency,about 1~5%. Test result with the patch(two cores): SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 Fixes: `39368ebfc6` ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jia He <justin.he@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-11-05 14:34:27 +01:00
Gavin Hu	9ed8770628	ring/c11: synchronize load and store of the tail Synchronize the load-acquire of the tail and the store-release within update_tail, the store release ensures all the ring operations, enqueue or dequeue, are seen by the observers on the other side as soon as they see the updated tail. The load-acquire is needed here as the data dependency is not a reliable way for ordering as the compiler might break it by saving to temporary values to boost performance. When computing the free_entries and avail_entries, use atomic semantics to load the heads and tails instead. The patch was benchmarked with test/ring_perf_autotest and it decreases the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains are dependent on the number of lcores, depth of the ring, SPSC or MPMC. For 1 lcore, it also improves a little, about 3 ~ 4%. It is a big improvement, in case of MPMC, with two lcores and ring size of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%. This patch is a bug fix, while the improvement is a bonus. In our analysis the improvement comes from the cacheline pre-filling after hoisting load- acquire from _atomic_compare_exchange_n up above. The test command: $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\ 1024 -- -i Test result with this patch(two cores): SP/SC bulk enq/dequeue (size: 8): 5.86 MP/MC bulk enq/dequeue (size: 8): 10.15 SP/SC bulk enq/dequeue (size: 32): 1.94 MP/MC bulk enq/dequeue (size: 32): 2.36 In comparison of the test result without this patch: SP/SC bulk enq/dequeue (size: 8): 6.67 MP/MC bulk enq/dequeue (size: 8): 13.12 SP/SC bulk enq/dequeue (size: 32): 2.04 MP/MC bulk enq/dequeue (size: 32): 3.26 Fixes: `39368ebfc6` ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jia He <justin.he@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-11-05 14:34:19 +01:00
Fiona Trahe	30fadd8bc9	compressdev: fix op allocation Fixed bad logic in rte_comp_op_alloc() checking return value from rte_comp_op_raw_bulk_alloc(). This could have resulted in a seg-fault in error case. Made rte_comp_ob_bulk_alloc() code consistent with rte_comp_op_alloc(). Fixes: `96086db5a3` ("compressdev: add operation management") Cc: stable@dpdk.org Reported-by: Sabyasachi Sengupta <sabyasg@hpe.com> Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Shally Verma <shally.verma@caviumnetworks.com>	2018-11-02 12:25:39 +01:00
Fiona Trahe	1fca14d7dd	compressdev: clarify usage of op structure Add note on usage of op structure and when it can be accessed and freed. Fixes: `63f4bfd532` ("compressdev: add enqueue/dequeue functions") Cc: stable@dpdk.org Signed-off-by: Fiona Trahe <fiona.trahe@intel.com> Acked-by: Shally Verma <shally.verma@caviumnetworks.com>	2018-11-02 12:25:39 +01:00
Alejandro Lucero	84e7477e10	mem: add thread unsafe version for DMA mask check During memory initialization calling rte_mem_check_dma_mask leads to a deadlock because memory_hotplug_lock is locked by a writer, the current code in execution, and rte_memseg_walk tries to lock as a reader. This patch adds a thread_unsafe version which will call the final function specifying the memory_hotplug_lock does not need to be acquired. The patch also modified rte_mem_check_dma_mask as a intermediate step which will call the final function as before, implying memory_hotplug_lock will be acquired. PMDs should always use the version acquiring the lock with the thread_unsafe one being just for internal EAL memory code. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:14 +01:00
Alejandro Lucero	165c89b845	mem: use DMA mask check for legacy memory If a device reports addressing limitations through a dma mask, the IOVAs for mapped memory needs to be checked out for ensuring correct functionality. Previous patches introduced this DMA check for main memory code currently being used but other options like legacy memory and the no hugepages option need to be also considered. This patch adds the DMA check for those cases. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:13 +01:00
Alejandro Lucero	4374ebc24b	malloc: modify error message for DMA mask check If DMA mask checks shows mapped memory out of the supported range specified by the DMA mask, nothing can be done but return an error an report the error. This can imply the app not being executed at all or precluding dynamic memory allocation once the app is running. In any case, we can advice the user to force IOVA as PA if currently IOVA being VA and user being root. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:11 +01:00
Alejandro Lucero	9d15773606	mem: add function for setting DMA mask This patch adds the possibility of setting a dma mask to be used once the memory initialization is done. This is currently needed when IOVA mode is set by PCI related code and an x86 IOMMU hardware unit is present. Current code calls rte_mem_check_dma_mask but it is wrong to do so at that point because the memory has not been initialized yet. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:04 +01:00
Alejandro Lucero	0de9eb6138	mem: rename DMA mask check with proper prefix Current name rte_eal_check_dma_mask does not follow the naming used in the rest of the file. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:01:54 +01:00
Alejandro Lucero	af0aa2357d	malloc: fix DMA mask check The param needs to be the maskbits and not the mask. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:01:43 +01:00
Ferruh Yigit	3370975b99	eal: fix build with gcc 9.0 build error: In function ‘eal_plugin_add’, .../lib/librte_eal/common/eal_common_options.c:225:2: error: ‘strncpy’ output may be truncated copying 4095 bytes from a string of length 4095 [-Werror=stringop-truncation] strncpy(solib->name, path, PATH_MAX-1); strncpy may result a not null-terminated string, replaced it with strlcpy Fixes: `f9a08f6502` ("eal: add support for shared object drivers") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-04 22:48:04 +01:00
Jerin Jacob	11b57c6980	eal: fix error string function errno_autotest testcase were failed since commit `5d7b673d5f` ("mk: build with _GNU_SOURCE defined by default") RTE>>errno_autotest rte_strerror: 'Unknown error 11', strerror: 'Resource temporarily unavailable' Test Failed There are two different version of strerror_t() based on _GNU_SOURCE definition. /* XSI-compliant / int strerror_r(int errnum, char buf, size_t buflen); /* GNU-specific / char strerror_r(int errnum, char buf, size_t buflen); Since the GNU-specific version returns char the exiting "if" condition around the strerror_r fails. Switching back to XSI-compliant version to allow a) Portable strerror_r() usage as musl c library uses non GNU speficic version https://git.musl-libc.org/cgit/musl/tree/src/string/strerror_r.c b) Based on strerror_r(3) man page, it is possible that GNU-specific version need not use char *buf to fill error message instead it can use the immutable static string from the library and return it. note from strerror_r(3) man page: The GNU-specific strerror_r() returns a pointer to a string containing the error message. This may be either a pointer to a string that the function stores in buf, or a pointer to some (immutable) static string (in which case buf is unused). Fixes: `5d7b673d5f` ("mk: build with _GNU_SOURCE defined by default") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-04 22:25:20 +01:00
Luca Boccassi	349ac52bbc	eal/linux: handle UIO read failure in interrupt handler If a device is unplugged while an interrupt is pending, the read call to the uio device to remove it from the poll wait list can fail resulting in it being continually polled forever. This change checks for the read failing and if so, unregisters the device as an interrupt source and causes the wait list to be rebuilt. This race has been reported and observed in production. Fixes: `0a45657a67` ("pci: rework interrupt handling") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org>	2018-11-02 10:50:49 +01:00
Darek Stojaczyk	95781f4c64	eal: fix memory leak on multi-process hotplug rollback Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-02 00:05:49 +01:00
Darek Stojaczyk	04854a39e6	eal: fix IPC memory leak on device hotplug rte_mp_request_sync() says that the caller is responsible for freeing one of its parameters afterwards. EAL didn't do that, causing a memory leak. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-31 19:16:42 +01:00
Thomas Monjalon	bdbe62df10	version: 18.11-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-29 04:08:26 +01:00
Ferruh Yigit	8298310ffa	lib: reduce global variable usage Some global variables can be eliminated, since they are not part of public interface, it is free to remove them. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-29 02:34:27 +01:00
Ferruh Yigit	9757358342	fix global variable issues Various fixes related to the global variable usage. Fixes: `43e610bb85` ("compress/octeontx: introduce octeontx zip PMD") Fixes: `c378f084d6` ("compress/octeontx: add device setup ops") Fixes: `b43ebc65aa` ("compress/octeontx: create private xform") Fixes: `b1ce8ebd97` ("eventdev: add PMD callbacks for eth Rx adapter") Fixes: `3810ae4357` ("eventdev: add interrupt driven queues to Rx adapter") Fixes: `fefed3d1e6` ("enic: new driver") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-10-29 02:34:27 +01:00
Ferruh Yigit	b74fd6b842	add missing static keyword to globals Some global variables can indeed be static, add static keyword to them. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-29 02:01:08 +01:00
Darek Stojaczyk	6bcb7c95fe	vfio: share default container in multi-process So far each process in MP used to have a separate container and relied on the primary process to register all memsegs. Mapping external memory via rte_vfio_container_dma_map() in secondary processes was broken, because the default (process-local) container had no groups bound. There was even no way to bind any groups to it, because the container fd was deeply encapsulated within EAL. This patch introduces a new SOCKET_REQ_DEFAULT_CONTAINER message type for MP synchronization, makes all processes within a MP party use a single default container, and hence fixes rte_vfio_container_dma_map() for secondary processes. From what I checked this behavior was always the same, but started to be invalid/insufficient once mapping external memory was allowed. While here, fix up the comment on rte_vfio_get_container_fd(). This function always opens a new container, never reuses an old one. Fixes: `73a6390859` ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:59:48 +01:00
Darek Stojaczyk	88e2d78a20	vfio: fix read of freed memory on getting container fd We were reading some memory just after freeing it. Fixes: `83a73c5fef` ("vfio: use generic multi-process channel") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:59:48 +01:00
Dariusz Stojaczyk	4f5519ed83	vfio: cleanup getting group fd Factor out duplicated code. Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:58:32 +01:00
Dariusz Stojaczyk	db9d32b8b7	vfio: check if group fd is already open Always attempt to find already opened fd for an iommu group as subsequent attempts to open it will fail. There's no public API to check if a group was already bound and has a container, so rte_vfio_container_group_bind() shouldn't fail in such case. Fixes: `ea2dc10668` ("vfio: add multi container support") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:58:31 +01:00
Eric Zhang	075b182b54	eal: force IOVA to a particular mode This patch uses EAL option "--iova-mode" to force the IOVA mode to a particular value. There exists virtual devices that are not directly attached to the PCI bus, and therefore the auto detection of the IOVA mode based on probing the PCI bus and IOMMU configuration may not report the required addressing mode. Using the EAL option permits the mode to be explicitly configured in this scenario. Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>	2018-10-29 00:01:05 +01:00
Santosh Shukla	783667c9f9	eal: add --iova-mode option In the case of user don't want to use bus iova scheme and want to override. For that, adding EAL option --iova-mode=<string> where valid input string is 'pa' or 'va'. Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 23:41:26 +01:00
Takeshi Yoshimura	998c89f148	vfio: fix sPAPR IOMMU mapping Commit `73a6390859` ("vfio: allow to map other memory regions") introduced a bug in sPAPR IOMMU mapping. The commit removed necessary ioctl with VFIO_IOMMU_SPAPR_REGISTER_MEMORY. Also, vfio_spapr_map_walk should call vfio_spapr_dma_do_map instead of vfio_spapr_dma_mem_map. Fixes: `73a6390859` ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>	2018-10-28 22:33:27 +01:00
Alejandro Lucero	1df2170287	mem: use address hint for mapping hugepages Linux kernel uses a really high address as starting address for serving mmaps calls. If there exist addressing limitations and IOVA mode is VA, this starting address is likely too high for those devices. However, it is possible to use a lower address in the process virtual address space as with 64 bits there is a lot of available space. This patch adds an address hint as starting address for 64 bits systems and increments the hint for next invocations. If the mmap call does not use the hint address, repeat the mmap call using the hint address incremented by page size. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 22:06:05 +01:00
Alejandro Lucero	223b7f1d5e	mem: add function for checking memseg IOVA A device can suffer addressing limitations. This function checks memsegs have iovas within the supported range based on dma mask. PMDs should use this function during initialization if device suffers addressing limitations, returning an error if this function returns memsegs out of range. Another usage is for emulated IOMMU hardware with addressing limitations. It is necessary to save the most restricted dma mask for checking out memory allocated dynamically after initialization. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 22:04:34 +01:00
Darek Stojaczyk	c7810c319d	malloc: check size hint when reserving the biggest element RTE_MEMZONE_SIZE_HINT_ONLY wasn't checked in any way, causing size hints to be parsed as hard requirements. This resulted in some allocations being failed prematurely. Fixes: `68b6092bd3` ("malloc: allow reserving biggest element") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 11:59:02 +01:00
Ziye Yang	e4f2c1421d	eal/linux: fix memory leak of logid This patch is used to fix the memory leak issue of logid. We use the ASAN test in SPDK when integrating DPDK and find this memory leak issue. Fixes: `d8a2bc71df` ("log: remove app path from syslog id") Cc: stable@dpdk.org Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-28 11:42:18 +01:00
Li Han	8721e07478	ip_frag: fix overflow in key comparison in struct ip_frag_key,src_dst[] type is uint64_t. but "val" which to store the calc restult ,type is uint32_t. we may lost high 32 bit key. and function return value is int, but it won't return < 0. Signed-off-by: Li Han <han.li1@zte.com.cn> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-28 11:16:49 +01:00
Luca Boccassi	085766aa67	build: change default driver installation directory As part of the effort of consolidating the DPDK installation bits and pieces across distros, set the default directory of lib/ where PMDs get installed to dpdk/pmds-XX.YY. It's necessary to have a versioned subdirectory as multiple ABI revisions might be installed at the same time, so having a fixed name will cause trouble with the autoload feature. Small refactor with parsing and saving the major version to a variable, since it's now used in 3 different places. Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Timothy Redaelli <tredaelli@redhat.com>	2018-10-27 23:22:12 +02:00
Kevin Laatz	57ae0ec626	build: add dependency on telemetry to apps with meson This patch adds telemetry as a dependecy to all applications. Without these changes, the --telemetry flag will not be recognised and applications will fail to run if they want to enable telemetry. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:21:33 +02:00
Ciara Power	c8e76f5ac3	telemetry: add ability to disable selftest This patch adds functionality to enable/disable the selftest. This functionality will be extended in future to make the enabling/disabling more dynamic and remove this 'hardcoded' approach. We are temporarily using this approach due to the design changes (vdev vs eal) made to the library. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	0fe3a37924	telemetry: format json response when sending stats This patch adds functionality to create a JSON message in order to send it to a client socket. When stats are requested by a client, they are retrieved from the metrics library and encoded in JSON format. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	67c3c2de48	telemetry: update metrics before sending stats This patch adds functionality to update the statistics in the metrics library with values from the ethdev stats. Values need to be updated before they are encoded into a JSON message and sent to the client that requested them. The JSON encoding will be added in a subsequent patch. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	1b756087db	telemetry: add parser for client socket messages This patch adds the parser file. This is used to parse any messages that are received on any of the client sockets. Currently, the unregister functionality works using the parser. Functionality relating to getting statistic values for certain ports will be added in a subsequent patch, however the parsing involved for that command is added in this patch. Some of the parser code included is in preparation for future functionality, that is not implemented yet in this patchset. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	ee5ff0d329	telemetry: add client feature and sockets This patch introduces clients to the telemetry API. When a client makes a connection through the initial telemetry socket, they can send a message through the socket to be parsed. Register messages are expected through this socket, to enable clients to register and have a client socket setup for future communications. A TAILQ is used to store all clients information. Using this, the client sockets are polled for messages, which will later be parsed and dealt with accordingly. Functionality that make use of the client sockets were introduced in this patch also, such as writing to client sockets, and sending error responses. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	fdbdb3f9ce	telemetry: add initial connection socket This patch adds the telemetry UNIX socket. It is used to allow connections from external clients. On the initial connection from a client, ethdev stats are registered in the metrics library, to allow for their retrieval at a later stage. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	8877ac688b	telemetry: introduce infrastructure This patch adds the infrastructure and initial code for the telemetry library. The telemetry init is registered with eal_init(). We can then check to see if --telemetry was passed as an eal option. If --telemetry was parsed, then we call telemetry init at the end of eal init. Control threads are used to get CPU cycles for telemetry, which are configured in this patch also. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:20 +02:00
Kevin Laatz	6911c9fd8f	eal: export function to get runtime directory This patch makes the eal_get_runtime_dir() API public so it can be used from outside EAL. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 12:10:24 +02:00
Kevin Laatz	2395332798	eal: add option register infrastructure This commit adds infrastructure to EAL that allows an application to register it's init function with EAL. This allows libraries to be initialized at the end of EAL init. This infrastructure allows libraries that depend on EAL to be initialized as part of EAL init, removing circular dependency issues. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-27 12:10:10 +02:00
Stephen Hemminger	f4336b4388	ethdev: make offload name API non-experimental The offload name functions are useful, but since they are marked experimental they can not be used by upstream projects. For example, VPP duplicates the same table in its code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Ilya Maximets	a51639cc72	eal: add nanosleep based delay function Add a new rte_delay_us_sleep() function that uses nanosleep(). This function can be used by applications to not implement their own nanosleep() based callback and by internal DPDK code if CPU non-blocking delay needed. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	8fae42404c	ethdev: fix iterator default behaviour for representors The iterator was matching all representors if it was not specified in the devargs string. It was a wrong default behaviour. If there is no representor parameter in the devargs, the iterator should not match any representor port. The implementation of the default behaviour would be simpler if a "no match" handler is added to rte_kvargs_process(). As it requires an API breakage, it will be reworked later. Fixes: `a7d3c6271d` ("ethdev: support representor id as iterator filter") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	336f20bc5e	ethdev: filter destroy event before probed If a port is being created and rollbacked because of an error, the event RTE_ETH_EVENT_DESTROY should not be sent. It makes no sense to receive a destroy event for a port which was not yet announced via RTE_ETH_EVENT_NEW. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Olivier Matz	140af04e63	net: support MPLS in software packet type parser Add RTE_PTYPE_L2_ETHER_MPLS packet type support in rte_net_get_ptype(). Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Olivier Matz	e480cf487a	net: add MPLS header structure Add the Mpls header structure in librte_net. It will be used by next patch that adds the support of Mpls L2 layer in the software packet type parser. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Tiwei Bie	0dcdf32e64	vhost: initialize postcopy ufd properly Currently, postcopy_ufd is initialized to 0 implicitly, so fd 0 could be closed unexpectedly by vhost_backend_cleanup(). Fix this issue by initializing postcopy_ufd to -1 explicitly. Fixes: `9eefef3b59` ("vhost: introduce postcopy advise message") Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:06 +02:00
Maxime Coquelin	e988a6d845	vhost: avoid memory barriers when no descriptors dequeued In both split and packed dequeue paths, flush_shadow_used_ring and vhost_ring_call variants gets called even if not packets have been dequeued, and so no descriptors updates happened. It has an impact on CPU pipeline, as memory barriers are used in these functions. This patch don't call these functions if no descriptors have been dequeued. The performance gain with split ring when dequeue zero-copy is disabled should be null, but should be noticeable with packed ring or dequeue zero-copy enabled. Fixes: `ae999ce49d` ("vhost: add Tx support for packed ring") Fixes: `915cf94042` ("vhost: use shadow used ring in dequeue path") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	c10cdce180	ethdev: support MAC address as iterator filter The MAC addresses of a port can be matched with devargs. As the conflict between rte_ether.h and netinet/ether.h is not resolved, the MAC parsing is done with a rte_cmdline function. As a result, cmdline library becomes a dependency of ethdev. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	a7d3c6271d	ethdev: support representor id as iterator filter The representor id is added in rte_eth_dev_data in order to be able to match a port with its representor id in devargs. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	7f07e7d794	ethdev: move representor parsing functions The functions for representor devargs parsing were static in the file rte_ethdev.c. In order to reuse them in the file rte_class_eth.c, they are moved to the files ethdev_private.c/.h. A log is fixed by adding a missing line feed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	cc0579f233	kvargs: support list value If a value contains a comma, rte_kvargs_tokenize() will split here. In order to support list syntax [a,b] as value, an extra parsing of the square brackets is added. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	01e5b16c57	eal: remove deprecated attach/detach functions These hotplug functions were deprecated and have some new replacements. As announced earlier, the oldest ones are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	c9cce42876	ethdev: remove deprecated attach/detach functions The hotplug attach/detach features are implemented in EAL layer. There is a new ethdev iterator to retrieve ports from ethdev layer. As announced earlier, the (buggy) ethdev functions are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	8b9ea3b3ca	ethdev: allow iterating with pure class filter If no rte_device is given in the iterator, eth_dev_match() is looking at all ports without any restriction, except the ethdev kvargs filter. It allows to iterate with a devargs filter referencing only some ethdev parameters. The format (from the new devargs syntax) is: class=eth,paramY=Y Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	214ed1acd1	ethdev: add iterator to match devargs input The iterator will return the ethdev port ids matching a devargs string. It is recommended to use the macro RTE_ETH_FOREACH_MATCHING_DEV() for usage convenience. The class string is prefixed with '+' in order to skip the validation of the parameter keys. It is tolerated for the compatibility with the old (current) syntax where all parameters (bus, class and driver) are mixed in the same string without any delimiter. Thanks to this compatibility prefix, the driver parameters will be skipped during the ethdev parsing, and not considered invalid. A macro is introduced in rte_common.h to workaround a const field. This hack is needed to free const strings in the iterator. It is preferred to keep the const for these fields, because it gives a hint that they are not changed at each iteration. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Tiwei Bie	16b9e38e74	vhost: fix vector filling for packed ring We should return the length of the buffers described by the current descriptor chain after filling the buffer vector. So we need to zero the *len first. Fixes: `2f3225a7d6` ("vhost: add vector filling support for packed ring") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:05 +02:00
Timothy Redaelli	1726e9994c	vhost/crypto: fix shared lib build without cryptodev Currently it's not possible to build DPDK as shared library with cryptodev disabled since vhost is trying to link with rte_crypto, but rte_crypto and rte_hash are only needed when you build vhost_crypto and so only when cryptodev is enabled. This patch fix this by linking rte_vhost with rte_crypto and rte_hash only when cryptodev is enabled. Fixes: `b4ca812986` ("vhost/crypto: fix build without cryptodev") Fixes: `939066d965` ("vhost/crypto: add public function implementation") Cc: stable@dpdk.org Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:05 +02:00
Ori Kam	7307cf6333	ethdev: add raw encapsulation action Currenlty the encap/decap actions only support encapsulation of VXLAN and NVGRE L2 packets (L2 encapsulation is where the inner packet has a valid Ethernet header, while L3 encapsulation is where the inner packet doesn't have the Ethernet header). In addtion the parameter to to the encap action is a list of rte items, this results in 2 extra translation, between the application to the actioni and from the action to the NIC. This results in negative impact on the insertion performance. Looking forward there are going to be a need to support many more tunnel encapsulations. For example MPLSoGRE, MPLSoUDP. Adding the new encapsulation will result in duplication of code. For example the code for handling NVGRE and VXLAN are exactly the same, and each new tunnel will have the same exact structure. This patch introduce a raw encapsulation that can support L2 tunnel types and L3 tunnel types. In addtion the new encapsulations commands are using raw buffer inorder to save the converstion time, both for the application and the PMD. In order to encapsulate L3 tunnel type there is a need to use both actions in the same rule: The decap to remove the L2 of the original packet, and then encap command to encapsulate the packet with the tunnel. For decap L3 there is also a need to use both commands in the same flow first the decap command to remove the outer tunnel header and then encap to add the L2 header. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Dekel Peled	839b20be0e	ethdev: support metadata as flow rule criteria As described in [1], a new rte_flow item is added to support metadata to use as flow rule match pattern. The metadata is an opaque item, fully controlled by the application. The use of metadata is relevant for egress rules only. It can be set in the flow rule using the RTE_FLOW_ITEM_META. An additional member 'tx_metadata' is added in union with existing member 'hash' of struct 'rte_mbuf', located to avoid conflicts with existing fields. This additional member is used to carry the metadata item. Application should set the packet metadata in the mbuf dedicated field, and set the PKT_TX_METADATA flag in the mbuf->ol_flags. The NIC will use the packet metadata as match criteria for relevant flow rules. This patch introduces metadata item type for rte_flow RTE_FLOW_ITEM_META, along with corresponding struct rte_flow_item_meta and ol_flag PKT_TX_METADATA. [1] "[RFC,v2] ethdev: support metadata as flow rule criteria" Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	23ea57a2a0	ethdev: complete closing of port After closing a port, it cannot be restarted. So there is no reason to not free all associated resources. The last step was done with rte_eth_dev_detach() which is deprecated. Instead of blindly removing the associated rte_device, the driver should check if no more port (ethdev, cryptodev, etc) is open for the device. The last ethdev freeing which were done by rte_eth_dev_detach(), are now done at the end of rte_eth_dev_close() if the driver supports the flag RTE_ETH_DEV_CLOSE_REMOVE. There will be a transition period for PMDs to enable this new flag and migrate to the new behaviour. When enabling RTE_ETH_DEV_CLOSE_REMOVE, the PMD must free all its private resources for the port, in its dev_close function. It is advised to call the dev_close function in the remove function in order to support removing a device without closing its ports. Some drivers does not allocate MAC addresses dynamically or separately. In those cases, the pointer is set to NULL, in order to avoid wrongly freeing them in rte_eth_dev_release_port(). A closed port will have the state RTE_ETH_DEV_UNUSED which is considered as invalid by rte_eth_dev_is_valid_port(). So validity is not checked anymore for closed ports in testpmd. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	662dbc322d	ethdev: remove release function for secondary process After previous changes, the function rte_eth_dev_release_port() can be used for primary or secondary process as well. The only difference with rte_eth_dev_release_port_secondary() is the shared lock used in rte_eth_dev_release_port(). The function rte_eth_dev_release_port_secondary() was recently added in 18.11 cycle. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	e16adf08e5	ethdev: free all common data when releasing port This is a clean-up of common ethdev data freeing. All data freeing are moved to rte_eth_dev_release_port() and done only in case of primary process. It is probably fixing some memory leaks for PMDs which were not freeing all data. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	f6a12685a5	ethdev: fix doxygen comments of shared data fields Some doxygen comments were wrongly associated to the next field because of syntax / instead of /< Some other cleanups (like alignment) are done. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Anatoly Burakov	5640171c52	malloc: fix external heap allocation in no-huge mode When no-huge mode is enabled, we always overwrite the socket ID to be SOCKET_ID_ANY in rte_malloc, because there is no NUMA awareness in no-huge mode. However, with external memory support, a socket ID may have other meaning, and we cannot overwrite the socket ID in those cases. Fixes: `65ff37b105` ("malloc: add function to check if socket is external") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 22:37:59 +02:00
Yipeng Wang	2d28bb5ddd	hash: remove unnecessary pause There is a rte_pause in hash table reset function. Since the loop is not a polling loop on shared data structure, the rte_pause is not needed. Fixes: `b26473ff8f` ("hash: add reset function") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 22:01:37 +02:00
Dan Gora	c6fd54f28c	kni: add function to set link state on kernel interface Add a new API function to KNI, rte_kni_update_link() to allow DPDK applications to update the link status for KNI network interfaces in the linux kernel. Signed-off-by: Dan Gora <dg@adax.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 19:46:15 +02:00
Phil Yang	fd5f33323e	kni: introduce C11 atomic into FIFO synchronization Syncing the values by adding c11 atomic memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	711859cd0d	kni: fix kernel FIFO synchronization Adding memory barrier to make sure the values being synced before updating fifo_write in kni_fifo_put and fifo_read in kni_fifo_get. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	0b05abe7bf	kni: fix FIFO synchronization With existing code in kni_fifo_put, rx_q values are not being updated before updating fifo_write. While reading rx_q in kni_net_rx_normal, This is causing the sync issue on other core. The same situation happens in kni_fifo_get as well. So syncing the values by adding memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	ede56cc18d	config: rename option for C11 memory model Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory model, so all modules can leverage C11 atomic extension by enable this option. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-26 18:09:22 +02:00
David Hunt	31259a3376	power: fix traffic aware build 1. %ld to PRId64 for 32-bit builds 2. Fix dependency on librte_timer Fixes: `450f079131` ("power: add traffic pattern aware power control") Signed-off-by: David Hunt <david.hunt@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 14:51:36 +02:00
Jerin Jacob	1f8494f002	eal/ppc: support pause API Add support for rte_pause() implementation for ppc64. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>	2018-10-26 14:37:56 +02:00
Honnappa Nagarahalli	e605a1d36c	hash: add lock-free r/w concurrency Add lock-free read-write concurrency. This is achieved by the following changes. 1) Add memory ordering to avoid race conditions. The only race condition that can occur is - using the key store element before the key write is completed. Hence, while inserting the element the release memory order is used. Any other race condition is caught by the key comparison. Memory orderings are added only where needed. For ex: reads in the writer's context do not need memory ordering as there is a single writer. key_idx in the bucket entry and pdata in the key store element are used for synchronisation. key_idx is used to release an inserted entry in the bucket to the reader. Use of pdata for synchronisation is required due to updation of an existing entry where-in only the pdata is updated without updating key_idx. 2) Reader-writer concurrency issue, caused by moving the keys to their alternative locations during key insert, is solved by introducing a global counter(tbl_chng_cnt) indicating a change in table. 3) Add the flag to enable reader-writer concurrency during run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli	dbdbc4a2e9	hash: fix key store element alignment Fix the key store array element alignment such that every array element is aligned on KEY_ALIGNMENT boundary. This is required to make 'pdata' in 'struct rte_hash_key' align on its natural boundary for atomic load/store. Fixes: `473d1bebce` ("hash: allow to store data in hash table") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli	9d033dac7d	hash: support no free on delete rte_hash_lookup_xxx APIs return the index of slot in the key store. Application(reader) can use that index to reference other data structures in its scope. Because of this, the index should not be freed till the application completes using the index. RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this. When this flag is enabled rte_hash_del_xxx APIs do not free the key-store index/internal memory associated with the deleted entry. The new API rte_hash_free_key_with_position should be called to free the key-store index/internal memory after calling rte_hash_del_xxx APIs. Suggested-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli	40f8e9c28c	hash: separate multi-writer from r/w concurrency RW concurrency is required with single writer and multiple reader usecase as well. Hence, multi-writer should not be enabled by default when RW concurrency is enabled. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:43:52 +02:00
David Hunt	757bf2e7cf	lib/power: add changes for host commands/policies This patch does a couple of things: * Adds a new message type for removing policies (PKT_POLICY_REMOVE) Used when we want to remove a previously created policy. * Adds a core_type bool to the channel packet struct to specify whether the type of core we want to control is virtual or physical. Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 10:48:15 +02:00
Liang Ma	450f079131	power: add traffic pattern aware power control 1. Abstract For packet processing workloads such as DPDK polling is continuous. This means CPU cores always show 100% busy independent of how much work those cores are doing. It is critical to accurately determine how busy a core is hugely important for the following reasons: * No indication of overload conditions. * User does not know how much real load is on a system, resulting in wasted energy as no power management is utilized. Compared to the original l3fwd-power design, instead of going to sleep after detecting an empty poll, the new mechanism just lowers the core frequency. As a result, the application does not stop polling the device, which leads to improved handling of bursts of traffic. When the system become busy, the empty poll mechanism can also increase the core frequency (including turbo) to do best effort for intensive traffic. This gives us more flexible and balanced traffic awareness over the standard l3fwd-power application. 2. Proposed solution The proposed solution focuses on how many times empty polls are executed. The less the number of empty polls, means current core is busy with processing workload, therefore, the higher frequency is needed. The high empty poll number indicates the current core not doing any real work therefore, we can lower the frequency to safe power. In the current implementation, each core has 1 empty-poll counter which assume 1 core is dedicated to 1 queue. This will need to be expanded in the future to support multiple queues per core. 2.1 Power state definition: LOW: Not currently used, reserved for future use. MED: the frequency is used to process modest traffic workload. HIGH: the frequency is used to process busy traffic workload. 2.2 There are two phases to establish the power management system: a.Initialization/Training phase. The training phase is necessary in order to figure out the system polling baseline numbers from idle to busy. The highest poll count will be during idle, where all polls are empty. These poll counts will be different between systems due to the many possible processor micro-arch, cache and device configurations, hence the training phase. In the training phase, traffic is blocked so the training algorithm can average the empty-poll numbers for the LOW, MED and HIGH power states in order to create a baseline. The core's counter are collected every 10ms, and the Training phase will take 2 seconds. Training is disabled as default configuration. The default parameter is applied. Sample App still can trigger training if that's needed. Once the training phase has been executed once on a system, the application can then be started with the relevant thresholds provided on the command line, allowing the application to start passing start traffic immediately b.Normal phase. Traffic starts immediately based on the default thresholds, or based on the user supplied thresholds via the command line parameters. The run-time poll counts are compared with the baseline and the decision will be taken to move to MED power state or HIGH power state. The counters are calculated every 10ms. 3. Proposed API 1. rte_power_empty_poll_stat_init(struct ep_params *eptr, uint8_t freq_tlb, struct ep_policy policy); which is used to initialize the power management system. 2. rte_power_empty_poll_stat_free(void); which is used to free the resource hold by power management system. 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); which is used to update specific core empty poll counter, not thread safe 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); which is used to update specific core valid poll counter, not thread safe 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core empty poll counter. 6. rte_power_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core valid poll counter. 7. rte_empty_poll_detection(struct rte_timer tim, void *arg); which is used to detect empty poll state changes then take action. Signed-off-by: Liang Ma <liang.j.ma@intel.com> Reviewed-by: Lei Yao <lei.a.yao@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2018-10-26 01:55:07 +02:00
Yipeng Wang	c7d93df552	hash: use partial-key hashing This commit changes the hashing mechanism to "partial-key hashing" to calculate bucket index and signature of key. This is proposed in Bin Fan, et al's paper "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing". Basically the idea is to use "xor" to derive alternative bucket from current bucket index and signature. With "partial-key hashing", it reduces the bucket memory requirement from two cache lines to one cache line, which improves the memory efficiency and thus the lookup speed. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	75706568a7	hash: add extendable bucket feature In use cases that hash table capacity needs to be guaranteed, the extendable bucket feature can be used to contain extra keys in linked lists when conflict happens. This is similar concept to the extendable bucket hash table in packet framework. This commit adds the extendable bucket feature. User can turn it on or off through the extra flag field during table creation time. Extendable bucket table composes of buckets that can be linked list to current main table. When extendable bucket is enabled, the hash table load can always achieve 100%. In other words, the table can always accommodate the same number of keys as the specified table size. This provides 100% table capacity guarantee. Although keys ending up in the ext buckets may have longer look up time, they should be rare due to the cuckoo algorithm. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	9904094344	hash: fix race condition in iterate In rte_hash_iterate, the reader lock did not protect the while loop which checks empty entry. This created a race condition that the entry may become empty when enters the lock, then a wrong key data value would be read out. This commit reads out the position in the while condition, which makes sure that the position will not be changed to empty before entering the lock. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 00:33:51 +02:00
Yipeng Wang	86c1ef2090	hash: remove unused constant Since the depth-first search of cuckoo path is removed, we do not need the macro anymore which specifies the depth of the cuckoo search. Fixes: `f2e3001b53` ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 00:00:16 +02:00
Jerin Jacob	a5cba5a2c4	mbuf: add IGMP packet type Add support for IGMP packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:51:16 +02:00
Jerin Jacob	8e255bdb1b	mbuf: add MPLS packet type Add support of MPLS packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:48:30 +02:00
Jerin Jacob	07e70104e0	mbuf: add FCoE packet type Add support of FCoE packet type. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>	2018-10-25 15:48:24 +02:00
Ferruh Yigit	73aa5c1332	ring: add library version to meson build Fixes: `a3d6026711` ("ring: relax alignment constraint on ring structure") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-10-25 14:30:05 +02:00
Ferruh Yigit	63f54e650a	mbuf: fix library version on meson build Fixes: `d27a626187` ("mbuf: remove control mbuf") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2018-10-25 14:29:22 +02:00
Rami Rosen	2b4aa851ec	bpf: fix a typo This trivial patch fixes a typo in rte_bpf_ethdev.h, Fixes: `a93ff62a89` ("bpf: introduce basic Rx/Tx filters") Cc: stable@dpdk.org Signed-off-by: Rami Rosen <ramirose@gmail.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-25 11:27:49 +02:00
Reshma Pattan	77b7485af7	latency: fix timestamp marking and latency calculation Latency calculation logic is not correct for the case where packets gets dropped before TX. As for the dropped packets, the timestamp is not cleared, and such packets still gets counted for latency calculation in next runs, that will result in inaccurate latency measurement. So fix this issue as below, Before setting timestamp in mbuf, check mbuf don't have any prior valid time stamp flag set and after marking the timestamp, set mbuf flags to indicate timestamp is valid. Before calculating timestamp check mbuf flags are set to indicate timestamp is valid. With the above logic it is guaranteed that correct timestamps have been used. Fixes: `5cd3cac9ed` ("latency: added new library for latency stats") Cc: stable@dpdk.org Reported-by: Bao-Long Tran <longtb5@viettel.com.vn> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com> Tested-by: Bao-Long Tran <longtb5@viettel.com.vn> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-10-25 10:30:13 +02:00
Paul Luse	66fd3a3b0f	bus/vdev: fix multi-process IPC buffer leak on scan This patch fixes an issue caught with ASAN where a vdev_scan() to a secondary bus was failing to free some memory. The doxygen comment in EAL is fixed at the same time. Fixes: `cdb068f031` ("bus/vdev: scan by multi-process channel") Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Paul Luse <paul.e.luse@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-25 10:28:13 +02:00

... 3 4 5 6 7 ...

5276 Commits