Currently, when we set the pstate governor to "performance", we check if
it is already set to this value, and if it is, we skip setting it.
However, we never save this value anywhere, so that next time we come
back and request the governor to be set to its original value, the
original value is empty.
Fix it by saving the original pstate governor first. While we're at it,
replace `strlcpy` with `rte_strscpy`.
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Previous fix for base frequency handling in pstate mode introduced a
couple of issues:
- When base_frequency file does not exist, it simply bails out because
of what appears to be accidental addition of FOPEN_OR_ERR_RET. This is
incorrect, as absence of this file is not fatal and is in fact
expected on kernel versions earlier than 5.3
- When base_frequency file does exist, it gets opened, but never gets
closed, resulting in a resource leak
Both issues also manifest themselves as Coverity defects (dead code, and
a resource leak), so this fix addresses both.
Coverity issue: 369693, 369694
Bugzilla ID: 668
Fixes: 4db9587bbf72 ("power: check sysfs base frequency")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
Some kernels may show in incorrect value for base frequency in
sysfs (e.g. 15 GHz). This throws off the SST-BF algorithm for
high and low priority cores. So if base_frequency is greater
than max turbo frequency, ignore, and handle it as a normal
core.
Known Kernel version with issue: Linux 5.8.7
Signed-off-by: David Hunt <david.hunt@intel.com>
During power initialization the pstate cpufreq api is
not setting the initial curr_idx of pstate_power_info
to corresponding current frequency index.
Without this the idx is always 0, which is causing the
below check to pass and returns without setting the initial
min/max frequency to system max frequency and this leads to
incorrect frequency settings when power_pstate_cpufreq_set_freq()
is called in the apps.
set_freq_internal(struct pstate_power_info *pi, uint32_t idx)
{
...
/* Check if it is the same as current */
if (idx == pi->curr_idx)
return 0;
...
}
scenario 1:
If system has starting scaling min/max: 1000/1000, and want to
set this to 2200/2200, the max frequency gets updated but not min.
scenario 2:
If system has starting scaling min/max: 2200/1000, and want to set
to 2200/2200, the max, min frequency was not updated. Since no change
in max that should be ok, but min was also ignored, which will be fixed
now with the new changes.
Fixes: e6c6dc0f ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Liang Ma <liang.j.ma@intel.com>
Since rte_atomicXX APIs are not allowed to be used, use C11 atomic
builtins for power in use state update.
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Hunt <david.hunt@intel.com>
Currently, there is no way to know if the power management env is
supported without trying to initialize it. The init API also does
not distinguish between failure due to some error and failure due to
power management not being available on the platform in the first
place.
Thus, add an API that provides capability of probing support for a
specific power management API.
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Calling pstate's or acpi's rte_power_freq_up() when on the highest
non-turbo frequency results in an error, if turbo is enabled in the BIOS,
but disabled via the power library.
The error is in the form of a return code and a RTE_LOG() entry
on the ERR level.
According to the API documentation, the frequency is scaled up
"according to the available frequencies". In case turbo is disabled,
that frequency is not available. This patch's rte_power_freq_up()
behaviour is also consistent with how rte_power_freq_max() is
implemented (i.e. the highest non-turbo frequency is set, in case
turbo is disabled).
Fixes: 445c6528b55f ("power: common interface for guest and host")
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Liang Ma <liang.j.ma@intel.com>
Fix these as they are user visible. Found with codespell.
Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Fixes: 0cbce3a167f1 ("vfio: skip DMA map failure if already mapped")
Fixes: 445c6528b55f ("power: common interface for guest and host")
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Fixes: 8f972312b8f4 ("vhost: support vhost-user")
Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
The ACPI and PState CPU frequency scaling drivers used the
__rte_cache_aligned attribute without including rte_memory.h, which
turns what looks as the declaration of a cache line-aligned struct
into a non-aligned struct declaration and the definition of an
instance of the struct.
Fixes: e6c6dc0f96 ("power: add p-state driver compatibility")
Fixes: 445c6528b5 ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Fix the resource leaking issue
Coverity issue: 337668
Fixes: b60fd5f8b1ce8f0a2c ("power: add bit for high frequency cores")
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
This patch will ensure the correct max frequency of a core is set in
the lcore_power_info struct when disabling turbo, while using the
intel pstate driver.
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Cc: stable@dpdk.org
Signed-off-by: Lee Daly <lee.daly@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Acked-by: Liang Ma <liang.j.ma@intel.com>
Do a global replace of snprintf(..."%s",...) with strlcpy, adding in the
rte_string_fns.h header if needed. The function changes in this patch were
auto-generated via command:
spatch --sp-file devtools/cocci/strlcpy.cocci --dir . --in-place
and then the files edited using awk to add in the missing header:
gawk -i inplace '/include <rte_/ && ! seen { \
print "#include <rte_string_fns.h>"; seen=1} {print}'
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
This patch adds a new bit in the capabilities mask that's returned by
rte_power_get_capabilities(), allowing application to query which cores
have the higher frequencies, and can then pin the workloads accordingly.
Returned Bits:
0 - Turbo Boost enabled
1 - Higher core base_frequency
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently the Power Libray stores the governor name with an embedded
newline read from the scaling_governor sysfs file. This patch strips
it out.
Fixes: 445c6528b55f ("power: common interface for guest and host")
Cc: stable@dpdk.org
Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Coverity issue: 328528
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Lei Yao <lei.a.yao@intel.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Signed-off-by: David Hunt <david.hunt@intel.com>
The power_pstate_cpufreq_freqs() function was returning -1 in an
unsigned int, causing buffer over-runs when the results were being
processed. This function should be returning zero for all error
conditions, similar to it's acpi relation, power_acpi_cpufreq_freqs().
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Signed-off-by: David Hunt <david.hunt@intel.com>
This patch fixes a segfault in the case where a null buffer is passed
to the following functions:
power_acpi_cpufreq_freqs()
power_pstate_cpufreq_freqs()
Fixes: 445c6528b55f ("power: common interface for guest and host")
Signed-off-by: David Hunt <david.hunt@intel.com>
In the power_set_governor_*() functions, we using fputs() on /sys
filesystem. However, we also need to call fflush() to ensure that
the write completes successfully. Otherwise the attempt to set the
power governor fails and the function returns as if it has
succeeded. This patch adds an fflush to ensure that the
write succeeds, otherwise returns an error.
Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility")
Signed-off-by: David Hunt <david.hunt@intel.com>
Previously, in order to use the power library, it was necessary
for the user to disable the intel_pstate driver by adding
“intel_pstate=disable” to the kernel command line for the system,
which causes the acpi_cpufreq driver to be loaded in its place.
This patch adds the ability for the power library use the intel-pstate
driver.
It adds a new suite of functions behind the current power library API,
and will seamlessly set up the user facing API function pointers to
the relevant functions depending on whether the system is running with
acpi_cpufreq kernel driver, intel_pstate kernel driver or in a guest,
using kvm. The library API and ABI is unchanged.
Signed-off-by: Liang Ma <liang.j.ma@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>