Commit Graph

12 Commits

Author SHA1 Message Date
David Hunt
7580f97338 doc: add power management scale mode reaction time note
When using PMD Power Management, scale mode reacts slower than
monitor mode and pause mode. Add note in user guide to this
effect.

Signed-off-by: David Hunt <david.hunt@intel.com>
2021-11-26 14:24:23 +01:00
Anatoly Burakov
f53fe635c1 power: support monitoring multiple Rx queues
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Anatoly Burakov
5dff9a72b0 power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
2021-07-09 21:13:13 +02:00
Liang Ma
682a645438 power: add ethdev power management
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.

This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).

This design is using PMD RX callbacks.

1. UMWAIT/UMONITOR:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. TPAUSE/Pause instruction

   This method uses the pause (or TPAUSE, if available) instruction to
   avoid busy polling.

3. Frequency scaling
   Reuse existing DPDK power library to scale up/down core frequency
   depending on traffic volume.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2021-01-29 15:29:48 +01:00
David Hunt
fa77f80f49 doc: fix references in power management guide
In the References section in the Power Management overview,
both links pointed to the same l3fwd-power app. Fix the links
so that one points to l3fwd-power, and the other points to
the vm_power_manager sample app.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
2019-01-20 13:17:48 +01:00
Yong Wang
5036034924 doc: fix a typo in power management guide
This patch fixes a typo in programmer's guide. It should be Frequency,
not Fequence.

Fixes: 450f079131 ("power: add traffic pattern aware power control")
Cc: stable@dpdk.org

Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
2019-01-15 02:40:41 +01:00
Liang Ma
450f079131 power: add traffic pattern aware power control
1. Abstract

For packet processing workloads such as DPDK polling is continuous.
This means CPU cores always show 100% busy independent of how much work
those cores are doing. It is critical to accurately determine how busy
a core is hugely important for the following reasons:

   * No indication of overload conditions.

   * User does not know how much real load is on a system, resulting
     in wasted energy as no power management is utilized.

Compared to the original l3fwd-power design, instead of going to sleep
after detecting an empty poll, the new mechanism just lowers the core
frequency. As a result, the application does not stop polling the device,
which leads to improved handling of bursts of traffic.

When the system become busy, the empty poll mechanism can also increase the
core frequency (including turbo) to do best effort for intensive traffic.
This gives us more flexible and balanced traffic awareness over the
standard l3fwd-power application.

2. Proposed solution

The proposed solution focuses on how many times empty polls are executed.
The less the number of empty polls, means current core is busy with
processing workload, therefore, the higher frequency is needed. The high
empty poll number indicates the current core not doing any real work
therefore, we can lower the frequency to safe power.

In the current implementation, each core has 1 empty-poll counter which
assume 1 core is dedicated to 1 queue. This will need to be expanded in the
future to support multiple queues per core.

2.1 Power state definition:

	LOW:  Not currently used, reserved for future use.

	MED:  the frequency is used to process modest traffic workload.

	HIGH: the frequency is used to process busy traffic workload.

2.2 There are two phases to establish the power management system:

	a.Initialization/Training phase. The training phase is necessary
	  in order to figure out the system polling baseline numbers from
	  idle to busy. The highest poll count will be during idle, where
	  all polls are empty. These poll counts will be different between
	  systems due to the many possible processor micro-arch, cache
	  and device configurations, hence the training phase.
	  In the training phase, traffic is blocked so the training
	  algorithm can average the empty-poll numbers for the LOW, MED and
	  HIGH  power states in order to create a baseline.
	  The core's counter are collected every 10ms, and the Training
	  phase will take 2 seconds.
	  Training is disabled as default configuration. The default
	  parameter is applied. Sample App still can trigger training
	  if that's needed. Once the training phase has been executed once on
	  a system, the application can then be started with the relevant
	  thresholds provided on the command line, allowing the application
	  to start passing start traffic immediately

	b.Normal phase. Traffic starts immediately based on the default
	  thresholds, or based on the user supplied thresholds via the
	  command line parameters. The run-time poll counts are compared with
	  the baseline and the decision will be taken to move to MED power
	  state or HIGH power state. The counters are calculated every 10ms.

3. Proposed  API

1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
		uint8_t *freq_tlb, struct ep_policy *policy);
which is used to initialize the power management system.
 
2.  rte_power_empty_poll_stat_free(void);
which is used to free the resource hold by power management system.
 
3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
which is used to update specific core empty poll counter, not thread safe
 
4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
which is used to update specific core valid poll counter, not thread safe
 
5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core empty poll counter.
 
6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core valid poll counter.

7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
which is used to detect empty poll state changes then take action.

Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
2018-10-26 01:55:07 +02:00
Ferruh Yigit
5630257fcc doc: convert Intel license headers to SPDX tags
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-02-06 23:27:08 +01:00
David Hunt
d9e71f5227 doc: add hyperthreading note to power library guide
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
2018-02-06 22:29:26 +01:00
David Hunt
94608a0f7f power: add per-core turbo boost API
Adds a new set of APIs to allow per-core turbo
enable-disable.

Signed-off-by: David Hunt <david.hunt@intel.com>
2017-09-22 16:35:12 +02:00
Siobhan Butler
48624fd96e doc: remove Intel references from prog guide
Removed redundant references to Intel(R) DPDK in Programmers Guide.

Signed-off-by: Siobhan Butler <siobhan.a.butler@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
2014-12-19 23:30:26 +01:00
Bernard Iremonger
fc1f2750a3 doc: programmers guide
The 1.7 DPDK_Prog_Guide document in MSWord has been converted to rst format for
use with Sphinx. There is an rst file for each chapter and an index.rst file
which contains the table of contents.
The top level index file has been modified to include this guide.

This document contains some png image files. If any of these png files are modified
they should be replaced with an svg file.

This is the sixth document from a set of 6 documents.

Signed-off-by:  Bernard Iremonger <bernard.iremonger@intel.com>
2014-11-18 14:49:54 +01:00