numam-dpdk

Author	SHA1	Message	Date
Kevin Laatz	57ae0ec626	build: add dependency on telemetry to apps with meson This patch adds telemetry as a dependecy to all applications. Without these changes, the --telemetry flag will not be recognised and applications will fail to run if they want to enable telemetry. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:21:33 +02:00
Ciara Power	c8e76f5ac3	telemetry: add ability to disable selftest This patch adds functionality to enable/disable the selftest. This functionality will be extended in future to make the enabling/disabling more dynamic and remove this 'hardcoded' approach. We are temporarily using this approach due to the design changes (vdev vs eal) made to the library. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	0fe3a37924	telemetry: format json response when sending stats This patch adds functionality to create a JSON message in order to send it to a client socket. When stats are requested by a client, they are retrieved from the metrics library and encoded in JSON format. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	67c3c2de48	telemetry: update metrics before sending stats This patch adds functionality to update the statistics in the metrics library with values from the ethdev stats. Values need to be updated before they are encoded into a JSON message and sent to the client that requested them. The JSON encoding will be added in a subsequent patch. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	1b756087db	telemetry: add parser for client socket messages This patch adds the parser file. This is used to parse any messages that are received on any of the client sockets. Currently, the unregister functionality works using the parser. Functionality relating to getting statistic values for certain ports will be added in a subsequent patch, however the parsing involved for that command is added in this patch. Some of the parser code included is in preparation for future functionality, that is not implemented yet in this patchset. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	ee5ff0d329	telemetry: add client feature and sockets This patch introduces clients to the telemetry API. When a client makes a connection through the initial telemetry socket, they can send a message through the socket to be parsed. Register messages are expected through this socket, to enable clients to register and have a client socket setup for future communications. A TAILQ is used to store all clients information. Using this, the client sockets are polled for messages, which will later be parsed and dealt with accordingly. Functionality that make use of the client sockets were introduced in this patch also, such as writing to client sockets, and sending error responses. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	fdbdb3f9ce	telemetry: add initial connection socket This patch adds the telemetry UNIX socket. It is used to allow connections from external clients. On the initial connection from a client, ethdev stats are registered in the metrics library, to allow for their retrieval at a later stage. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:23 +02:00
Ciara Power	8877ac688b	telemetry: introduce infrastructure This patch adds the infrastructure and initial code for the telemetry library. The telemetry init is registered with eal_init(). We can then check to see if --telemetry was passed as an eal option. If --telemetry was parsed, then we call telemetry init at the end of eal init. Control threads are used to get CPU cycles for telemetry, which are configured in this patch also. Signed-off-by: Ciara Power <ciara.power@intel.com> Signed-off-by: Brian Archbold <brian.archbold@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Signed-off-by: Radu Nicolau <radu.nicolau@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 15:18:20 +02:00
Kevin Laatz	6911c9fd8f	eal: export function to get runtime directory This patch makes the eal_get_runtime_dir() API public so it can be used from outside EAL. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 12:10:24 +02:00
Kevin Laatz	2395332798	eal: add option register infrastructure This commit adds infrastructure to EAL that allows an application to register it's init function with EAL. This allows libraries to be initialized at the end of EAL init. This infrastructure allows libraries that depend on EAL to be initialized as part of EAL init, removing circular dependency issues. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-27 12:10:10 +02:00
Stephen Hemminger	f4336b4388	ethdev: make offload name API non-experimental The offload name functions are useful, but since they are marked experimental they can not be used by upstream projects. For example, VPP duplicates the same table in its code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Ilya Maximets	a51639cc72	eal: add nanosleep based delay function Add a new rte_delay_us_sleep() function that uses nanosleep(). This function can be used by applications to not implement their own nanosleep() based callback and by internal DPDK code if CPU non-blocking delay needed. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	8fae42404c	ethdev: fix iterator default behaviour for representors The iterator was matching all representors if it was not specified in the devargs string. It was a wrong default behaviour. If there is no representor parameter in the devargs, the iterator should not match any representor port. The implementation of the default behaviour would be simpler if a "no match" handler is added to rte_kvargs_process(). As it requires an API breakage, it will be reworked later. Fixes: a7d3c6271d55 ("ethdev: support representor id as iterator filter") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	336f20bc5e	ethdev: filter destroy event before probed If a port is being created and rollbacked because of an error, the event RTE_ETH_EVENT_DESTROY should not be sent. It makes no sense to receive a destroy event for a port which was not yet announced via RTE_ETH_EVENT_NEW. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Olivier Matz	140af04e63	net: support MPLS in software packet type parser Add RTE_PTYPE_L2_ETHER_MPLS packet type support in rte_net_get_ptype(). Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Olivier Matz	e480cf487a	net: add MPLS header structure Add the Mpls header structure in librte_net. It will be used by next patch that adds the support of Mpls L2 layer in the software packet type parser. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Tiwei Bie	0dcdf32e64	vhost: initialize postcopy ufd properly Currently, postcopy_ufd is initialized to 0 implicitly, so fd 0 could be closed unexpectedly by vhost_backend_cleanup(). Fix this issue by initializing postcopy_ufd to -1 explicitly. Fixes: 9eefef3b5970 ("vhost: introduce postcopy advise message") Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:06 +02:00
Maxime Coquelin	e988a6d845	vhost: avoid memory barriers when no descriptors dequeued In both split and packed dequeue paths, flush_shadow_used_ring and vhost_ring_call variants gets called even if not packets have been dequeued, and so no descriptors updates happened. It has an impact on CPU pipeline, as memory barriers are used in these functions. This patch don't call these functions if no descriptors have been dequeued. The performance gain with split ring when dequeue zero-copy is disabled should be null, but should be noticeable with packed ring or dequeue zero-copy enabled. Fixes: ae999ce49dcb ("vhost: add Tx support for packed ring") Fixes: 915cf9404225 ("vhost: use shadow used ring in dequeue path") Cc: stable@dpdk.org Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	c10cdce180	ethdev: support MAC address as iterator filter The MAC addresses of a port can be matched with devargs. As the conflict between rte_ether.h and netinet/ether.h is not resolved, the MAC parsing is done with a rte_cmdline function. As a result, cmdline library becomes a dependency of ethdev. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	a7d3c6271d	ethdev: support representor id as iterator filter The representor id is added in rte_eth_dev_data in order to be able to match a port with its representor id in devargs. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	7f07e7d794	ethdev: move representor parsing functions The functions for representor devargs parsing were static in the file rte_ethdev.c. In order to reuse them in the file rte_class_eth.c, they are moved to the files ethdev_private.c/.h. A log is fixed by adding a missing line feed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	cc0579f233	kvargs: support list value If a value contains a comma, rte_kvargs_tokenize() will split here. In order to support list syntax [a,b] as value, an extra parsing of the square brackets is added. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	01e5b16c57	eal: remove deprecated attach/detach functions These hotplug functions were deprecated and have some new replacements. As announced earlier, the oldest ones are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	c9cce42876	ethdev: remove deprecated attach/detach functions The hotplug attach/detach features are implemented in EAL layer. There is a new ethdev iterator to retrieve ports from ethdev layer. As announced earlier, the (buggy) ethdev functions are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	8b9ea3b3ca	ethdev: allow iterating with pure class filter If no rte_device is given in the iterator, eth_dev_match() is looking at all ports without any restriction, except the ethdev kvargs filter. It allows to iterate with a devargs filter referencing only some ethdev parameters. The format (from the new devargs syntax) is: class=eth,paramY=Y Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	214ed1acd1	ethdev: add iterator to match devargs input The iterator will return the ethdev port ids matching a devargs string. It is recommended to use the macro RTE_ETH_FOREACH_MATCHING_DEV() for usage convenience. The class string is prefixed with '+' in order to skip the validation of the parameter keys. It is tolerated for the compatibility with the old (current) syntax where all parameters (bus, class and driver) are mixed in the same string without any delimiter. Thanks to this compatibility prefix, the driver parameters will be skipped during the ethdev parsing, and not considered invalid. A macro is introduced in rte_common.h to workaround a const field. This hack is needed to free const strings in the iterator. It is preferred to keep the const for these fields, because it gives a hint that they are not changed at each iteration. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Tiwei Bie	16b9e38e74	vhost: fix vector filling for packed ring We should return the length of the buffers described by the current descriptor chain after filling the buffer vector. So we need to zero the *len first. Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring") Cc: stable@dpdk.org Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:05 +02:00
Timothy Redaelli	1726e9994c	vhost/crypto: fix shared lib build without cryptodev Currently it's not possible to build DPDK as shared library with cryptodev disabled since vhost is trying to link with rte_crypto, but rte_crypto and rte_hash are only needed when you build vhost_crypto and so only when cryptodev is enabled. This patch fix this by linking rte_vhost with rte_crypto and rte_hash only when cryptodev is enabled. Fixes: b4ca81298613 ("vhost/crypto: fix build without cryptodev") Fixes: 939066d96563 ("vhost/crypto: add public function implementation") Cc: stable@dpdk.org Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-10-26 22:14:05 +02:00
Ori Kam	7307cf6333	ethdev: add raw encapsulation action Currenlty the encap/decap actions only support encapsulation of VXLAN and NVGRE L2 packets (L2 encapsulation is where the inner packet has a valid Ethernet header, while L3 encapsulation is where the inner packet doesn't have the Ethernet header). In addtion the parameter to to the encap action is a list of rte items, this results in 2 extra translation, between the application to the actioni and from the action to the NIC. This results in negative impact on the insertion performance. Looking forward there are going to be a need to support many more tunnel encapsulations. For example MPLSoGRE, MPLSoUDP. Adding the new encapsulation will result in duplication of code. For example the code for handling NVGRE and VXLAN are exactly the same, and each new tunnel will have the same exact structure. This patch introduce a raw encapsulation that can support L2 tunnel types and L3 tunnel types. In addtion the new encapsulations commands are using raw buffer inorder to save the converstion time, both for the application and the PMD. In order to encapsulate L3 tunnel type there is a need to use both actions in the same rule: The decap to remove the L2 of the original packet, and then encap command to encapsulate the packet with the tunnel. For decap L3 there is also a need to use both commands in the same flow first the decap command to remove the outer tunnel header and then encap to add the L2 header. Signed-off-by: Ori Kam <orika@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Dekel Peled	839b20be0e	ethdev: support metadata as flow rule criteria As described in [1], a new rte_flow item is added to support metadata to use as flow rule match pattern. The metadata is an opaque item, fully controlled by the application. The use of metadata is relevant for egress rules only. It can be set in the flow rule using the RTE_FLOW_ITEM_META. An additional member 'tx_metadata' is added in union with existing member 'hash' of struct 'rte_mbuf', located to avoid conflicts with existing fields. This additional member is used to carry the metadata item. Application should set the packet metadata in the mbuf dedicated field, and set the PKT_TX_METADATA flag in the mbuf->ol_flags. The NIC will use the packet metadata as match criteria for relevant flow rules. This patch introduces metadata item type for rte_flow RTE_FLOW_ITEM_META, along with corresponding struct rte_flow_item_meta and ol_flag PKT_TX_METADATA. [1] "[RFC,v2] ethdev: support metadata as flow rule criteria" Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	23ea57a2a0	ethdev: complete closing of port After closing a port, it cannot be restarted. So there is no reason to not free all associated resources. The last step was done with rte_eth_dev_detach() which is deprecated. Instead of blindly removing the associated rte_device, the driver should check if no more port (ethdev, cryptodev, etc) is open for the device. The last ethdev freeing which were done by rte_eth_dev_detach(), are now done at the end of rte_eth_dev_close() if the driver supports the flag RTE_ETH_DEV_CLOSE_REMOVE. There will be a transition period for PMDs to enable this new flag and migrate to the new behaviour. When enabling RTE_ETH_DEV_CLOSE_REMOVE, the PMD must free all its private resources for the port, in its dev_close function. It is advised to call the dev_close function in the remove function in order to support removing a device without closing its ports. Some drivers does not allocate MAC addresses dynamically or separately. In those cases, the pointer is set to NULL, in order to avoid wrongly freeing them in rte_eth_dev_release_port(). A closed port will have the state RTE_ETH_DEV_UNUSED which is considered as invalid by rte_eth_dev_is_valid_port(). So validity is not checked anymore for closed ports in testpmd. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	662dbc322d	ethdev: remove release function for secondary process After previous changes, the function rte_eth_dev_release_port() can be used for primary or secondary process as well. The only difference with rte_eth_dev_release_port_secondary() is the shared lock used in rte_eth_dev_release_port(). The function rte_eth_dev_release_port_secondary() was recently added in 18.11 cycle. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	e16adf08e5	ethdev: free all common data when releasing port This is a clean-up of common ethdev data freeing. All data freeing are moved to rte_eth_dev_release_port() and done only in case of primary process. It is probably fixing some memory leaks for PMDs which were not freeing all data. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	f6a12685a5	ethdev: fix doxygen comments of shared data fields Some doxygen comments were wrongly associated to the next field because of syntax / instead of /< Some other cleanups (like alignment) are done. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Anatoly Burakov	5640171c52	malloc: fix external heap allocation in no-huge mode When no-huge mode is enabled, we always overwrite the socket ID to be SOCKET_ID_ANY in rte_malloc, because there is no NUMA awareness in no-huge mode. However, with external memory support, a socket ID may have other meaning, and we cannot overwrite the socket ID in those cases. Fixes: 65ff37b105f7 ("malloc: add function to check if socket is external") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 22:37:59 +02:00
Yipeng Wang	2d28bb5ddd	hash: remove unnecessary pause There is a rte_pause in hash table reset function. Since the loop is not a polling loop on shared data structure, the rte_pause is not needed. Fixes: b26473ff8f4a ("hash: add reset function") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 22:01:37 +02:00
Dan Gora	c6fd54f28c	kni: add function to set link state on kernel interface Add a new API function to KNI, rte_kni_update_link() to allow DPDK applications to update the link status for KNI network interfaces in the linux kernel. Signed-off-by: Dan Gora <dg@adax.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 19:46:15 +02:00
Phil Yang	fd5f33323e	kni: introduce C11 atomic into FIFO synchronization Syncing the values by adding c11 atomic memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	711859cd0d	kni: fix kernel FIFO synchronization Adding memory barrier to make sure the values being synced before updating fifo_write in kni_fifo_put and fifo_read in kni_fifo_get. Fixes: 3fc5ca2f6352 ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	0b05abe7bf	kni: fix FIFO synchronization With existing code in kni_fifo_put, rx_q values are not being updated before updating fifo_write. While reading rx_q in kni_net_rx_normal, This is causing the sync issue on other core. The same situation happens in kni_fifo_get as well. So syncing the values by adding memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Fixes: 3fc5ca2f6352 ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	ede56cc18d	config: rename option for C11 memory model Keep only single config option RTE_USE_C11_MEM_MODEL for C11 memory model, so all modules can leverage C11 atomic extension by enable this option. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-26 18:09:22 +02:00
David Hunt	31259a3376	power: fix traffic aware build 1. %ld to PRId64 for 32-bit builds 2. Fix dependency on librte_timer Fixes: 450f0791312c ("power: add traffic pattern aware power control") Signed-off-by: David Hunt <david.hunt@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 14:51:36 +02:00
Jerin Jacob	1f8494f002	eal/ppc: support pause API Add support for rte_pause() implementation for ppc64. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>	2018-10-26 14:37:56 +02:00
Honnappa Nagarahalli	e605a1d36c	hash: add lock-free r/w concurrency Add lock-free read-write concurrency. This is achieved by the following changes. 1) Add memory ordering to avoid race conditions. The only race condition that can occur is - using the key store element before the key write is completed. Hence, while inserting the element the release memory order is used. Any other race condition is caught by the key comparison. Memory orderings are added only where needed. For ex: reads in the writer's context do not need memory ordering as there is a single writer. key_idx in the bucket entry and pdata in the key store element are used for synchronisation. key_idx is used to release an inserted entry in the bucket to the reader. Use of pdata for synchronisation is required due to updation of an existing entry where-in only the pdata is updated without updating key_idx. 2) Reader-writer concurrency issue, caused by moving the keys to their alternative locations during key insert, is solved by introducing a global counter(tbl_chng_cnt) indicating a change in table. 3) Add the flag to enable reader-writer concurrency during run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli	dbdbc4a2e9	hash: fix key store element alignment Fix the key store array element alignment such that every array element is aligned on KEY_ALIGNMENT boundary. This is required to make 'pdata' in 'struct rte_hash_key' align on its natural boundary for atomic load/store. Fixes: 473d1bebce43 ("hash: allow to store data in hash table") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli	9d033dac7d	hash: support no free on delete rte_hash_lookup_xxx APIs return the index of slot in the key store. Application(reader) can use that index to reference other data structures in its scope. Because of this, the index should not be freed till the application completes using the index. RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this. When this flag is enabled rte_hash_del_xxx APIs do not free the key-store index/internal memory associated with the deleted entry. The new API rte_hash_free_key_with_position should be called to free the key-store index/internal memory after calling rte_hash_del_xxx APIs. Suggested-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli	40f8e9c28c	hash: separate multi-writer from r/w concurrency RW concurrency is required with single writer and multiple reader usecase as well. Hence, multi-writer should not be enabled by default when RW concurrency is enabled. Fixes: f2e3001b53ec ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:43:52 +02:00
David Hunt	757bf2e7cf	lib/power: add changes for host commands/policies This patch does a couple of things: * Adds a new message type for removing policies (PKT_POLICY_REMOVE) Used when we want to remove a previously created policy. * Adds a core_type bool to the channel packet struct to specify whether the type of core we want to control is virtual or physical. Signed-off-by: David Hunt <david.hunt@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 10:48:15 +02:00
Liang Ma	450f079131	power: add traffic pattern aware power control 1. Abstract For packet processing workloads such as DPDK polling is continuous. This means CPU cores always show 100% busy independent of how much work those cores are doing. It is critical to accurately determine how busy a core is hugely important for the following reasons: * No indication of overload conditions. * User does not know how much real load is on a system, resulting in wasted energy as no power management is utilized. Compared to the original l3fwd-power design, instead of going to sleep after detecting an empty poll, the new mechanism just lowers the core frequency. As a result, the application does not stop polling the device, which leads to improved handling of bursts of traffic. When the system become busy, the empty poll mechanism can also increase the core frequency (including turbo) to do best effort for intensive traffic. This gives us more flexible and balanced traffic awareness over the standard l3fwd-power application. 2. Proposed solution The proposed solution focuses on how many times empty polls are executed. The less the number of empty polls, means current core is busy with processing workload, therefore, the higher frequency is needed. The high empty poll number indicates the current core not doing any real work therefore, we can lower the frequency to safe power. In the current implementation, each core has 1 empty-poll counter which assume 1 core is dedicated to 1 queue. This will need to be expanded in the future to support multiple queues per core. 2.1 Power state definition: LOW: Not currently used, reserved for future use. MED: the frequency is used to process modest traffic workload. HIGH: the frequency is used to process busy traffic workload. 2.2 There are two phases to establish the power management system: a.Initialization/Training phase. The training phase is necessary in order to figure out the system polling baseline numbers from idle to busy. The highest poll count will be during idle, where all polls are empty. These poll counts will be different between systems due to the many possible processor micro-arch, cache and device configurations, hence the training phase. In the training phase, traffic is blocked so the training algorithm can average the empty-poll numbers for the LOW, MED and HIGH power states in order to create a baseline. The core's counter are collected every 10ms, and the Training phase will take 2 seconds. Training is disabled as default configuration. The default parameter is applied. Sample App still can trigger training if that's needed. Once the training phase has been executed once on a system, the application can then be started with the relevant thresholds provided on the command line, allowing the application to start passing start traffic immediately b.Normal phase. Traffic starts immediately based on the default thresholds, or based on the user supplied thresholds via the command line parameters. The run-time poll counts are compared with the baseline and the decision will be taken to move to MED power state or HIGH power state. The counters are calculated every 10ms. 3. Proposed API 1. rte_power_empty_poll_stat_init(struct ep_params *eptr, uint8_t freq_tlb, struct ep_policy policy); which is used to initialize the power management system. 2. rte_power_empty_poll_stat_free(void); which is used to free the resource hold by power management system. 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); which is used to update specific core empty poll counter, not thread safe 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); which is used to update specific core valid poll counter, not thread safe 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core empty poll counter. 6. rte_power_poll_stat_fetch(unsigned int lcore_id); which is used to get specific core valid poll counter. 7. rte_empty_poll_detection(struct rte_timer tim, void *arg); which is used to detect empty poll state changes then take action. Signed-off-by: Liang Ma <liang.j.ma@intel.com> Reviewed-by: Lei Yao <lei.a.yao@intel.com> Acked-by: David Hunt <david.hunt@intel.com>	2018-10-26 01:55:07 +02:00
Yipeng Wang	c7d93df552	hash: use partial-key hashing This commit changes the hashing mechanism to "partial-key hashing" to calculate bucket index and signature of key. This is proposed in Bin Fan, et al's paper "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing". Basically the idea is to use "xor" to derive alternative bucket from current bucket index and signature. With "partial-key hashing", it reduces the bucket memory requirement from two cache lines to one cache line, which improves the memory efficiency and thus the lookup speed. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00

1 2 3 4 5 ...

4887 Commits