In the power_set_governor_*() functions, we using fputs() on /sys
filesystem. However, we also need to call fflush() to ensure that
the write completes successfully. Otherwise the attempt to set the
power governor fails and the function returns as if it has
succeeded. This patch adds an fflush to ensure that the
write succeeds, otherwise returns an error.
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Signed-off-by: David Hunt <david.hunt@intel.com>
Previously, in order to use the power library, it was necessary
for the user to disable the intel_pstate driver by adding
“intel_pstate=disable” to the kernel command line for the system,
which causes the acpi_cpufreq driver to be loaded in its place.
This patch adds the ability for the power library use the intel-pstate
driver.
It adds a new suite of functions behind the current power library API,
and will seamlessly set up the user facing API function pointers to
the relevant functions depending on whether the system is running with
acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest,
using kvm. The library API and ABI is unchanged.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
This patch does a couple of things:
* Adds a new message type for removing policies (PKT_POLICY_REMOVE)
Used when we want to remove a previously created policy.
* Adds a core_type bool to the channel packet struct to specify whether
the type of core we want to control is virtual or physical.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
1. Abstract
For packet processing workloads such as DPDK polling is continuous.
This means CPU cores always show 100% busy independent of how much work
those cores are doing. It is critical to accurately determine how busy
a core is hugely important for the following reasons:
* No indication of overload conditions.
* User does not know how much real load is on a system, resulting
in wasted energy as no power management is utilized.
Compared to the original l3fwd-power design, instead of going to sleep
after detecting an empty poll, the new mechanism just lowers the core
frequency. As a result, the application does not stop polling the device,
which leads to improved handling of bursts of traffic.
When the system become busy, the empty poll mechanism can also increase the
core frequency (including turbo) to do best effort for intensive traffic.
This gives us more flexible and balanced traffic awareness over the
standard l3fwd-power application.
2. Proposed solution
The proposed solution focuses on how many times empty polls are executed.
The less the number of empty polls, means current core is busy with
processing workload, therefore, the higher frequency is needed. The high
empty poll number indicates the current core not doing any real work
therefore, we can lower the frequency to safe power.
In the current implementation, each core has 1 empty-poll counter which
assume 1 core is dedicated to 1 queue. This will need to be expanded in the
future to support multiple queues per core.
2.1 Power state definition:
LOW: Not currently used, reserved for future use.
MED: the frequency is used to process modest traffic workload.
HIGH: the frequency is used to process busy traffic workload.
2.2 There are two phases to establish the power management system:
a.Initialization/Training phase. The training phase is necessary
in order to figure out the system polling baseline numbers from
idle to busy. The highest poll count will be during idle, where
all polls are empty. These poll counts will be different between
systems due to the many possible processor micro-arch, cache
and device configurations, hence the training phase.
In the training phase, traffic is blocked so the training
algorithm can average the empty-poll numbers for the LOW, MED and
HIGH power states in order to create a baseline.
The core's counter are collected every 10ms, and the Training
phase will take 2 seconds.
Training is disabled as default configuration. The default
parameter is applied. Sample App still can trigger training
if that's needed. Once the training phase has been executed once on
a system, the application can then be started with the relevant
thresholds provided on the command line, allowing the application
to start passing start traffic immediately
b.Normal phase. Traffic starts immediately based on the default
thresholds, or based on the user supplied thresholds via the
command line parameters. The run-time poll counts are compared with
the baseline and the decision will be taken to move to MED power
state or HIGH power state. The counters are calculated every 10ms.
3. Proposed API
1. rte_power_empty_poll_stat_init(struct ep_params **eptr,
uint8_t *freq_tlb, struct ep_policy *policy);
which is used to initialize the power management system.
2. rte_power_empty_poll_stat_free(void);
which is used to free the resource hold by power management system.
3. rte_power_empty_poll_stat_update(unsigned int lcore_id);
which is used to update specific core empty poll counter, not thread safe
4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
which is used to update specific core valid poll counter, not thread safe
5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core empty poll counter.
6. rte_power_poll_stat_fetch(unsigned int lcore_id);
which is used to get specific core valid poll counter.
7. rte_empty_poll_detection(struct rte_timer *tim, void *arg);
which is used to detect empty poll state changes then take action.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Add the capability for the vm_power_manager to receive
a policy of type BRANCH_RATIO. This will add any vcpus
in the policy to the oob monitoring thread.
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Radu Nicolau <radu.nicolau@intel.com>
New API added, rte_power_get_capabilities(), that allows the
application to query the power and performance capabilities
of the CPU cores.
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Add non-EAL libraries to DPDK build. The compat lib is a special case,
along with the previously-added EAL, but all other libs can be build using
the same set of commands, where the individual meson.build files only need
to specify their dependencies, source files, header files and ABI versions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
rename private header file rte_power_kvm_vm.c
to power_kvm_vm.c. This prevents the private
functions from leaking into the documentation.
Change any private functions from
rte_<function_name> to just <function_name>.
Reserve the rte_ for public functions
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Rename private header file rte_power_acpi_cpufreq.c
to power_acpi_cpufreq.c.This prevents the private
functions from leaking into the documentation.
Change any private functions from rte_<function_name>
to just <function_name>.Reserve the rte_ for public functions.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Rename private header file rte_power_common.h
to power_common.h to prevent private functions
from leaking into the documentation.
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Since this patch-set attempts to clean up the power library,
and there are many instances of "unsigned" caught by checkpatch,
it was decided to clean these up first rather than have them included
in the later patches in the patch set. And would also minimise this
type of error being caught by checkpatch in the future
Signed-off-by: Marko Kovacevic <marko.kovacevic@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Repeated occurrences of 'the'.
The change was obtained using the following command:
sed -i "s;the the ;the ;" `git grep -l "the "`
Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.
The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.
Some DEPDIRS-xyz variables become useless, remove them.
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Adding new wrapper function to existing private (but unused 'till now)
function with an rte_power_ prefix.
The plan is to clean up all the header files in the next release so
that only the intended public functions are in the map file and only
the relevant headers have the rte_ prefix so that only they are
included in the documentation.
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Macro CHANNEL_CMDS_MAX_CPUS stand for the maximum number of cores
controlled by virtual channels. This macro only be used in the example,
so remove it from library to example header file.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
Function strerror(errno) has built strings only for non-negative errno values.
for negative values of errno it describe error as "Unknown error -errno"
to be more descriptive i put string "channel not found" taken from header.
The negative argument will be interpreted as a very large unsigned value.
Coverity issue: 13266
Coverity issue: 13269
Fixes: 445c6528b5 ("power: common interface for guest and host")
Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
The file rte_config.h is automatically generated and included.
No need to #include it.
The example performance-thread needs a makefile fix to avoid
overwriting the default cflags.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The CHANNEL_CMDS_MAX_VM_CHANNELS is duplicated in the channel_commands
header file. This commit removes that duplication.
Fixes: 210c383e24 ("power: packet format for vm power management")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
To differentiate libraries that break ABI, we add a library version number
suffix to the library, which must be incremented when a given libraries ABI is
broken. This patch enforces that addition, sets the initial abi soname
extension to 1 for each library and creates a symlink to the base SONAME so that
the test applications will link properly.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Add linker version script files to each DPDK library to put a stake in the
ground from which we can start cleaning up API's
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
rte_power_freq_min function did not include "extern" keyword,
causing linking errors.
Fixes: 445c6528b5 ("power: common interface for guest and host")
Reported-by: Ildar Mustafin <imustafin@bk.ru>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
librte_power now contains both rte_power_acpi_cpufreq and rte_power_kvm_vm
implementations.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Provides a command packet format for host and guest.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Moved the current librte_power implementation to rte_power_acpi_cpufreq, with
renaming of functions only.
Added rte_power_kvm_vm implementation to support Power Management from a VM.
librte_power now hides the implementation based on the environment used.
A new call rte_power_set_env() can explicidly set the environment, if not
called then auto-detection takes place.
rte_power_kvm_vm is subset of the librte_power APIs, the following is supported:
rte_power_init(unsigned lcore_id)
rte_power_exit(unsigned lcore_id)
rte_power_freq_up(unsigned lcore_id)
rte_power_freq_down(unsigned lcore_id)
rte_power_freq_min(unsigned lcore_id)
rte_power_freq_max(unsigned lcore_id)
The other unsupported APIs return -ENOTSUP
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Allows for the opening of Virtio-Serial devices on a VM, where a DPDK
application can send packets to the host based monitor. The packet formatted is
specified in channel_commands.h
Each device appears as a serial device in path
/dev/virtio-ports/virtio.serial.port.<agent_type>.<lcore_num> where each lcore
in a DPDK application has exclusive to a device/channel.
Each channel is opened in non-blocking mode, after a successful open a test
packet is send to the host to ensure the host side is monitoring.
Signed-off-by: Alan Carew <alan.carew@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The function rte_snprintf serves no useful purpose. It is the
same as snprintf() for all valid inputs. Deprecate it and
replace all uses in current code.
Leave the tests for the deprecated function in place.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>