summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/cpufreq.c
AgeCommit message (Collapse)Author
2022-05-17cpufreq: make interface functions and lock holding state clearSchspa Shi
cpufreq_offline() calls offline() and exit() under the policy rwsem But they are called outside the rwsem in cpufreq_online(). Make cpufreq_online() call offline() and exit() as well as online() and init() under the policy rwsem to achieve a clear lock relationship. All of the init() and online() implementations in the tree only initialize the policy object without attempting to acquire the policy rwsem and they won't call cpufreq APIs attempting to acquire it. Signed-off-by: Schspa Shi <schspa@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-17cpufreq: Abort show()/store() for half-initialized policiesSchspa Shi
If policy initialization fails after the sysfs files are created, there is a possibility to end up running show()/store() callbacks for half-initialized policies, which may have unpredictable outcomes. Abort show()/store() in such a case by making sure the policy is active. Also dectivate the policy on such failures. Signed-off-by: Schspa Shi <schspa@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-12cpufreq: Rearrange locking in cpufreq_remove_dev()Rafael J. Wysocki
Currently, cpufreq_remove_dev() invokes the ->exit() driver callback without holding the policy rwsem which is inconsistent with what happens if ->exit() is invoked directly from cpufreq_offline(). It also manipulates the real_cpus mask and removes the CPU device symlink without holding the policy rwsem, but cpufreq_offline() holds the rwsem around the modifications thereof. For consistency, modify cpufreq_remove_dev() to hold the policy rwsem until the ->exit() callback has been called (or it has been determined that it is not necessary to call it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-05-12cpufreq: Split cpufreq_offline()Rafael J. Wysocki
Split the "core" part running under the policy rwsem out of cpufreq_offline() to allow the locking in cpufreq_remove_dev() to be rearranged more easily. As a side-effect this eliminates the unlock label that's not needed any more. No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-05-12cpufreq: Reorganize checks in cpufreq_offline()Rafael J. Wysocki
Notice that cpufreq_offline() only needs to check policy_is_inactive() once and rearrange the code in there to make that happen. No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-05-11cpufreq: Clear real_cpus mask from remove_cpu_dev_symlink()Viresh Kumar
add_cpu_dev_symlink() is responsible for setting the CPUs in the real_cpus mask, the reverse of which should be done from remove_cpu_dev_symlink() to make it look clean and avoid any breakage later on. Move the call to clear the mask to remove_cpu_dev_symlink(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-09Revert "cpufreq: Fix possible race in cpufreq online error path"Viresh Kumar
This reverts commit f346e96267cd76175d6c201b40f770c0116a8a04. The commit tried to fix a possible real bug but it made it even worse. The fix was simply buggy as now an error out to out_offline_policy or out_exit_policy will try to release a semaphore which was never taken in the first place. This works fine only if we failed late, i.e. via out_destroy_policy. Fixes: f346e96267cd ("cpufreq: Fix possible race in cpufreq online error path") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-06cpufreq: Avoid unnecessary frequency updates due to mismatchViresh Kumar
For some platforms, the frequency returned by hardware may be slightly different from what is provided in the frequency table. For example, hardware may return 499 MHz instead of 500 MHz. In such cases it is better to avoid getting into unnecessary frequency updates, as we may end up switching policy->cur between the two and sending unnecessary pre/post update notifications, etc. This patch has chosen allows the hardware frequency and table frequency to deviate by 1 MHz for now, we may want to increase it a bit later on if someone still complains. Reported-by: Rex-BC Chen <rex-bc.chen@mediatek.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Jia-wei Chang <jia-wei.chang@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-22cpufreq: Fix possible race in cpufreq online error pathSchspa Shi
When cpufreq online fails, the policy->cpus mask is not cleared and policy->rwsem is released too early, so the driver can be invoked via the cpuinfo_cur_freq sysfs attribute while its ->offline() or ->exit() callbacks are being run. Take policy->clk as an example: static int cpufreq_online(unsigned int cpu) { ... // policy->cpus != 0 at this time down_write(&policy->rwsem); ret = cpufreq_add_dev_interface(policy); up_write(&policy->rwsem); return 0; out_destroy_policy: for_each_cpu(j, policy->real_cpus) remove_cpu_dev_symlink(policy, get_cpu_device(j)); up_write(&policy->rwsem); ... out_exit_policy: if (cpufreq_driver->exit) cpufreq_driver->exit(policy); clk_put(policy->clk); // policy->clk is a wild pointer ... ^ | Another process access __cpufreq_get cpufreq_verify_current_freq cpufreq_generic_get // acces wild pointer of policy->clk; | | out_offline_policy: | cpufreq_policy_free(policy); | // deleted here, and will wait for no body reference cpufreq_policy_put_kobj(policy); } Address this by modifying cpufreq_online() to release policy->rwsem in the error path after the driver callbacks have run and to clear policy->cpus before releasing the semaphore. Fixes: 7106e02baed4 ("cpufreq: release policy->rwsem on error") Signed-off-by: Schspa Shi <schspa@gmail.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-09cpufreq: Reintroduce ready() callbackBjorn Andersson
This effectively revert '4bf8e582119e ("cpufreq: Remove ready() callback")', in order to reintroduce the ready callback. This is needed in order to be able to leave the thermal pressure interrupts in the Qualcomm CPUfreq driver disabled during initialization, so that it doesn't fire while related_cpus are still 0. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> [ Viresh: Added the Chinese translation as well and updated commit msg ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-12-28cpufreq: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the cpufreq code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17cpufreq: Fix initialization of min and max frequency QoS requestsRafael J. Wysocki
The min and max frequency QoS requests in the cpufreq core are initialized to whatever the current min and max frequency values are at the init time, but if any of these values change later (for example, cpuinfo.max_freq is updated by the driver), these initial request values will be limiting the CPU frequency unnecessarily unless they are changed by user space via sysfs. To address this, initialize min_freq_req and max_freq_req to FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE, respectively, so they don't really limit anything until user space updates them. Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01cpufreq: Fix a comment in cpufreq_policy_freeTang Yizhou
Make the comment in blocking_notifier_call_chain() easier to understand. Signed-off-by: Tang Yizhou <tangyizhou@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()Xiongfeng Wang
When I hot added a CPU, I found 'cpufreq' directory was not created below /sys/devices/system/cpu/cpuX/. It is because get_cpu_device() failed in add_cpu_dev_symlink(). cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface. It will be called when the CPU device registered into the system. The call chain is as follows: register_cpu() ->device_register() ->device_add() ->bus_probe_device() ->cpufreq_add_dev() But only after the CPU device has been registered, we can get the CPU device by get_cpu_device(), otherwise it will return NULL. Since we already have the CPU device in cpufreq_add_dev(), pass it to add_cpu_dev_symlink(). I noticed that the 'kobj' of the CPU device has been added into the system before cpufreq_add_dev(). Fixes: 2f0ba790df51 ("cpufreq: Fix creation of symbolic links to policy directories") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05cpufreq: Use CPUFREQ_RELATION_E in DVFS governorsVincent Donnefort
Let the governors schedutil, conservative and ondemand to work, if possible on efficient frequencies only. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05cpufreq: Introducing CPUFREQ_RELATION_EVincent Donnefort
This newly introduced flag can be applied by a governor to a CPUFreq relation, when looking for a frequency within the policy table. The resolution would then only walk through efficient frequencies. Even with the flag set, the policy max limit will still be honoured. If no efficient frequencies can be found within the limits of the policy, an inefficient one would be returned. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05cpufreq: Make policy min/max hard requirementsVincent Donnefort
When applying the policy min/max limits, the requested frequency is simply clamped to not be out of range. It means, however, if one of the boundaries isn't an available frequency, the frequency resolution can return a value out of those limits, depending on the relation used. e.g. freq{0,1,2} being available frequencies. freq0 policy->min freq1 policy->max freq2 | | | | | 17kHz 18kHz 19kHz 20kHz 21kHz __resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds) __resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds) If, during the policy init, we resolve the requested min/max to existing frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a frequency which is inside the policy min/max range. Making the policy limits rigid helps to introduce the inefficient frequencies support. Resolving an inefficient frequency to an efficient one should not transgress policy->max (which can be set for thermal reason) and having a value we can trust simplify this comparison. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-02cpufreq: Remove ready() callbackViresh Kumar
This isn't used anymore, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-31Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq driver changes for v5.15 from Viresh Kumar: "This contains: - Update cpufreq-dt blocklist with more platforms (Bjorn Andersson). - Allow freq changes from any CPU for qcom-hw driver (Taniya Das). - Add DSVS interrupt's support for qcom-hw driver (Thara Gopinath). - A new callback (->register_em()) to register EM at a more convenient point of time." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support cpufreq: scmi: Use .register_em() to register with energy model cpufreq: vexpress: Use .register_em() to register with energy model cpufreq: scpi: Use .register_em() to register with energy model cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model cpufreq: omap: Use .register_em() to register with energy model cpufreq: mediatek: Use .register_em() to register with energy model cpufreq: imx6q: Use .register_em() to register with energy model cpufreq: dt: Use .register_em() to register with energy model cpufreq: Add callback to register with energy model cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
2021-08-12cpufreq: Add callback to register with energy modelViresh Kumar
Many cpufreq drivers register with the energy model for each policy and do exactly the same thing. Follow the footsteps of thermal-cooling, to get it done from the cpufreq core itself. Provide a new callback, which will be called, if present, by the cpufreq core at the right moment (more on that in the code's comment). Also provide a generic implementation that uses dev_pm_opp_of_register_em(). This also allows us to register with the EM at a later point of time, compared to ->init(), from where the EM core can access cpufreq policy directly using cpufreq_cpu_get() type of helpers and perform other work, like marking few frequencies inefficient, this will be done separately. Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-08-04cpufreq: Replace deprecated CPU-hotplug functionsSebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-30cpufreq: Remove ->resolve_freq()Viresh Kumar
Commit e3c062360870 ("cpufreq: add cpufreq_driver_resolve_freq()") introduced this callback, back in 2016, for drivers that provide the ->target() callback. The kernel hasn't seen a single user of it in the past 5 years and it is not likely to be used any time soon. Remove it for now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-30cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target()Viresh Kumar
__cpufreq_driver_target() open codes cpufreq_driver_resolve_freq(), lets make the former reuse the later. Separate out __resolve_freq() to accept relation as well as an argument and use it at both the locations. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-30cpufreq: Remove the ->stop_cpu() driver callbackViresh Kumar
Now that all users of ->stop_cpu() have been migrated to using other callbacks, drop it from the core. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Minor edits in the subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-23cpufreq: Make cpufreq_online() call driver->offline() on errorsRafael J. Wysocki
In the CPU removal path the ->offline() callback provided by the driver is always invoked before ->exit(), but in the cpufreq_online() error path it is not, so ->exit() is expected to somehow know the context in which it has been called and act accordingly. That is less than straightforward, so make cpufreq_online() invoke the driver's ->offline() callback, if present, on errors before ->exit() too. This only potentially affects intel_pstate. Fixes: 91a12e91dc39 ("cpufreq: Allow light-weight tear down and bring up of CPUs") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-04-08cpufreq: Remove unused for_each_policy macroShaokun Zhang
Macro 'for_each_policy' has become unused since commit f963735a3ca3 ("cpufreq: Create for_each_{in}active_policy()"), so remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-19cpufreq: Fix typo in kerneldoc commentYue Hu
Change 'Terget' to 'Target'. Should be Target. Signed-off-by: Yue Hu <huyue2@yulong.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-04cpufreq: Remove CPUFREQ_STICKY flagViresh Kumar
During cpufreq driver's registration, if the ->init() callback for all the CPUs fail then there is not much point in keeping the driver around as it will only account for more of unnecessary noise, for example cpufreq core will try to suspend/resume the driver which never got registered properly. The removal of such a driver is avoided if the driver carries the CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no one should ever need it now. A lot of drivers do set this flag, probably because they just copied it from other drivers. This was added earlier for some platforms [2] because their cpufreq drivers were getting registered before the CPUs were registered with subsys framework. And hence they used to fail. The same isn't true anymore though. The current code flow in the kernel is: start_kernel() -> kernel_init() -> kernel_init_freeable() -> do_basic_setup() -> driver_init() -> cpu_dev_init() -> subsys_system_register() //For CPUs -> do_initcalls() -> cpufreq_register_driver() Clearly, the CPUs will always get registered with subsys framework before any cpufreq driver can get probed. Remove the flag and update the relevant drivers. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15cpufreq: Add special-purpose fast-switching callback for driversRafael J. Wysocki
First off, some cpufreq drivers (eg. intel_pstate) can pass hints beyond the current target frequency to the hardware and there are no provisions for doing that in the cpufreq framework. In particular, today the driver has to assume that it should not allow the frequency to fall below the one requested by the governor (or the required capacity may not be provided) which may not be the case and which may lead to excessive energy usage in some scenarios. Second, the hints passed by these drivers to the hardware need not be in terms of the frequency, so representing the utilization numbers coming from the scheduler as frequency before passing them to those drivers is not really useful. Address the two points above by adding a special-purpose replacement for the ->fast_switch callback, called ->adjust_perf, allowing the governor to pass abstract performance level (rather than frequency) values for the minimum (required) and target (desired) performance along with the CPU capacity to compare them to. Also update the schedutil governor to use the new callback instead of ->fast_switch if present and if the utilization mertics are frequency-invariant (that is requisite for the direct mapping between the utilization and the CPU performance levels to be a reasonable approximation). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-11cpufreq: Fix cpufreq_online() return value on errorsWang ShaoBo
Make cpufreq_online() return negative error codes on all errors that cause the policy to be destroyed, as appropriate. Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-11cpufreq: Fix up several kerneldoc commentsRafael J. Wysocki
Fix up the remaining kerneldoc comments that don't adhere to the expected format and clarify some of them a bit. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-16Merge back cpufreq updates for v5.11.Rafael J. Wysocki
2020-11-10cpufreq: Add strict_target to struct cpufreq_policyRafael J. Wysocki
Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is set for the current governor to struct cpufreq_policy, so that the drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to access the governor object during every frequency transition. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10cpufreq: Introduce governor flagsRafael J. Wysocki
A new cpufreq governor flag will be added subsequently, so replace the bool dynamic_switching fleid in struct cpufreq_governor with a flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for the "dynamic switching" governors instead of it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-02cpufreq: Drop restore_freq from struct cpufreq_policyRafael J. Wysocki
The restore_freq field in struct cpufreq_policy is only used by __target_index() in one place and a local variable in that function may as well be used instead of it, so drop it and modify __target_index() accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-29cpufreq: Introduce cpufreq_driver_test_flags()Rafael J. Wysocki
Add a helper function to test the flags of the cpufreq driver in use againt a given flags mask. In particular, this will be needed to test the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-27cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flagRafael J. Wysocki
Generally, a cpufreq driver may need to update some internal upper and lower frequency boundaries on policy max and min changes, respectively, but currently this does not work if the target frequency does not change along with the policy limit. Namely, if the target frequency does not change along with the policy min or max, the "target_freq == policy->cur" check in __cpufreq_driver_target() prevents driver callbacks from being invoked and they do not even have a chance to update the corresponding internal boundary. This particularly affects the "powersave" and "performance" governors that always set the target frequency to one of the policy limits and it never changes when the other limit is updated. To allow cpufreq the drivers needing to update internal frequency boundaries on policy limits changes to avoid this issue, introduce a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will neutralize the check mentioned above. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-16cpufreq: Improve code around unlisted freq checkViresh Kumar
The cpufreq core checks if the frequency programmed by the bootloaders is not listed in the freq table and programs one from the table in such a case. This is done only if the driver has set the CPUFREQ_NEED_INITIAL_FREQ_CHECK flag. Currently we print two separate messages, with almost the same content, and do this with a pr_warn() which may be a bit too much as the driver only asked us to check this as it expected this to be the case. Lower down the severity of the print message by switching to pr_info() instead and print a single message only. Reported-by: Sumit Gupta <sumitg@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Sumit Gupta <sumitg@nvidia.com> Tested-by: Sumit Gupta <sumitg@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-08cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()Ionela Voinescu
Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-05cpufreq: Move traces and update to policy->cur to cpufreq coreViresh Kumar
The cpufreq core handles the updates to policy->cur and recording of cpufreq trace events for all the governors except schedutil's fast switch case. Move that as well to cpufreq core for consistency and readability. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-05cpufreq: stats: Enable stats for fast-switch as wellViresh Kumar
Now that all the blockers are gone for enabling stats in fast-switching case, enable it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18arch_topology, cpufreq: constify arch_* cpumasksValentin Schneider
The passed cpumask arguments to arch_set_freq_scale() and arch_freq_counters_available() are only iterated over, so reflect this in the prototype. This also allows to pass system cpumasks like cpu_online_mask without getting a warning. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18cpufreq: report whether cpufreq supports Frequency Invariance (FI)Ionela Voinescu
Now that the update of the FI scale factor is done in cpufreq core for selected functions - target(), target_index() and fast_switch(), we can provide feedback to the task scheduler and architecture code on whether cpufreq supports FI. For this purpose provide an external function to expose whether the cpufreq drivers support FI, by using a static key. The logic behind the enablement of cpufreq-based invariance is as follows: - cpufreq-based invariance is disabled by default - cpufreq-based invariance is enabled if any of the callbacks above is implemented while the unsupported setpolicy() is not The cpufreq_supports_freq_invariance() function only returns whether cpufreq is instrumented with the arch_set_freq_scale() calls that result in support for frequency invariance. Due to the lack of knowledge on whether the implementation of arch_set_freq_scale() actually results in the setting of a scale factor based on cpufreq information, it is up to the architecture code to ensure the setting and provision of the scale factor to the scheduler. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18cpufreq: move invariance setter calls in cpufreq coreIonela Voinescu
To properly scale its per-entity load-tracking signals, the task scheduler needs to be given a frequency scale factor, i.e. some image of the current frequency the CPU is running at. Currently, this scale can be computed either by using counters (APERF/MPERF on x86, AMU on arm64), or by piggy-backing on the frequency selection done by cpufreq. For the latter, drivers have to explicitly set the scale factor themselves, despite it being purely boiler-plate code: the required information depends entirely on the kind of frequency switch callback implemented by the driver, i.e. either of: target_index(), target(), fast_switch() and setpolicy(). The fitness of those callbacks with regard to driving the Frequency Invariance Engine (FIE) is studied below: target_index() ============== Documentation states that the chosen frequency "must be determined by freq_table[index].frequency". It isn't clear if it *has* to be that frequency, or if it can use that frequency value to do some computation that ultimately leads to a different frequency selection. All drivers go for the former, while the vexpress-spc-cpufreq has an atypical implementation which is handled separately. Therefore, the hook works on the assumption the core can use freq_table[index].frequency. target() ======= This has been flagged as deprecated since: commit 9c0ebcf78fde ("cpufreq: Implement light weight ->target_index() routine") It also doesn't have that many users: gx-suspmod.c:439: .target = cpufreq_gx_target, s3c24xx-cpufreq.c:428: .target = s3c_cpufreq_target, intel_pstate.c:2528: .target = intel_cpufreq_target, cppc_cpufreq.c:401: .target = cppc_cpufreq_set_target, cpufreq-nforce2.c:371: .target = nforce2_target, sh-cpufreq.c:163: .target = sh_cpufreq_target, pcc-cpufreq.c:573: .target = pcc_cpufreq_target, Similarly to the path taken for target_index() calls in the cpufreq core during a frequency change, all of the drivers above will mark the end of a frequency change by a call to cpufreq_freq_transition_end(). Therefore, cpufreq_freq_transition_end() can be used as the location for the arch_set_freq_scale() call to potentially inform the scheduler of the frequency change. This change maintains the previous functionality for the drivers that implement the target_index() callback, while also adding support for the few drivers that implement the deprecated target() callback. fast_switch() ============= This callback *has* to return the frequency that was selected. setpolicy() =========== This callback does not have any designated way of informing what was the end choice. But there are only two drivers using setpolicy(), and none of them have current FIE support: drivers/cpufreq/longrun.c:281: .setpolicy = longrun_set_policy, drivers/cpufreq/intel_pstate.c:2215: .setpolicy = intel_pstate_set_policy, The intel_pstate is known to use counter-driven frequency invariance. Conclusion ========== Given that the significant majority of current FIE enabled drivers use callbacks that lend themselves to triggering the setting of the FIE scale factor in a generic way, move the invariance setter calls to cpufreq core. As a result of setting the frequency scale factor in cpufreq core, after callbacks that lend themselves to trigger it, remove this functionality from the driver side. To be noted that despite marking a successful frequency change, many cpufreq drivers will consider the new frequency as the requested frequency, although this is might not be the one granted by the hardware. Therefore, the call to arch_set_freq_scale() is a "best effort" one, and it is up to the architecture if the new frequency is used in the new frequency scale factor setting (determined by the implementation of arch_set_freq_scale()) or eventually used by the scheduler (determined by the implementation of arch_scale_freq_capacity()). The architecture is in a better position to decide if it has better methods to obtain more accurate information regarding the current frequency and use that information instead (for example, the use of counters). Also, the implementation to arch_set_freq_scale() will now have to handle error conditions (current frequency == 0) in order to prevent the overhead in cpufreq core when the default arch_set_freq_scale() implementation is used. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Suggested-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-08-27cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()Viresh Kumar
"cpufreq_driver" is guaranteed to be valid here, no need to check it here. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-08-11cpufreq: intel_pstate: Implement passive mode with HWP enabledRafael J. Wysocki
Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2020-08-04Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq driver changes for v5.9-rc1 from Viresh Kumar: "Here are the details: - Adaptive voltage scaling (AVS) support and minor cleanups for brcmstb driver (Florian Fainelli and Markus Mayer). - A new tegra driver and cleanup for the existing one (Sumit Gupta and Jon Hunter). - Bandwidth level support for Qcom driver along with OPP changes (Sibi Sankar). - Cleanups to sti, cpufreq-dt, ap806, CPPC drivers (Viresh Kumar, Lee Jones, Ivan Kokshaysky, Sven Auhagen, and Xin Hao). - Make schedutil default governor for ARM (Valentin Schneider). - Fix dependency issues for imx (Walter Lozano). - Cleanup around cached_resolved_idx in cpufreq core (Viresh Kumar)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: make schedutil the default for arm and arm64 cpufreq: cached_resolved_idx can not be negative cpufreq: Add Tegra194 cpufreq driver dt-bindings: arm: Add NVIDIA Tegra194 CPU Complex binding cpufreq: imx: Select NVMEM_IMX_OCOTP cpufreq: sti-cpufreq: Fix some formatting and misspelling issues cpufreq: tegra186: Simplify probe return path cpufreq: CPPC: Reuse caps variable in few routines cpufreq: ap806: fix cpufreq driver needs ap cpu clk cpufreq: cppc: Reorder code and remove apply_hisi_workaround variable cpufreq: dt: fix oops on armada37xx cpufreq: brcmstb-avs-cpufreq: send S2_ENTER / S2_EXIT commands to AVS cpufreq: brcmstb-avs-cpufreq: Support polling AVS firmware cpufreq: brcmstb-avs-cpufreq: more flexible interface for __issue_avs_command() cpufreq: qcom: Disable fast switch when scaling DDR/L3 cpufreq: qcom: Update the bandwidth levels on frequency change OPP: Add and export helper to set bandwidth cpufreq: blacklist SC7180 in cpufreq-dt-platdev cpufreq: blacklist SDM845 in cpufreq-dt-platdev
2020-07-30cpufreq: cached_resolved_idx can not be negativeViresh Kumar
It is not possible for cached_resolved_idx to be invalid here as the cpufreq core always sets index to a positive value. Change its type to unsigned int and fix qcom usage a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-07-15cpufreq: cpufreq: Demote lots of function headers unworthy of kerneldoc statusLee Jones
Also provide missing function parameter description for 'cpu' and 'policy'. Fixes the following W=1 kernel build warning(s): drivers/cpufreq/cpufreq.c:60: warning: cannot understand function prototype: 'struct cpufreq_driver *cpufreq_driver; ' drivers/cpufreq/cpufreq.c:90: warning: Function parameter or member 'cpufreq_policy_notifier_list' not described in 'BLOCKING_NOTIFIER_HEAD' drivers/cpufreq/cpufreq.c:312: warning: Function parameter or member 'val' not described in 'adjust_jiffies' drivers/cpufreq/cpufreq.c:312: warning: Function parameter or member 'ci' not described in 'adjust_jiffies' drivers/cpufreq/cpufreq.c:538: warning: Function parameter or member 'policy' not described in 'cpufreq_driver_resolve_freq' drivers/cpufreq/cpufreq.c:686: warning: Function parameter or member 'file_name' not described in 'show_one' drivers/cpufreq/cpufreq.c:686: warning: Function parameter or member 'object' not described in 'show_one' drivers/cpufreq/cpufreq.c:731: warning: Function parameter or member 'file_name' not described in 'store_one' drivers/cpufreq/cpufreq.c:731: warning: Function parameter or member 'object' not described in 'store_one' drivers/cpufreq/cpufreq.c:741: warning: Function parameter or member 'policy' not described in 'show_cpuinfo_cur_freq' drivers/cpufreq/cpufreq.c:741: warning: Function parameter or member 'buf' not described in 'show_cpuinfo_cur_freq' drivers/cpufreq/cpufreq.c:754: warning: Function parameter or member 'policy' not described in 'show_scaling_governor' drivers/cpufreq/cpufreq.c:754: warning: Function parameter or member 'buf' not described in 'show_scaling_governor' drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'policy' not described in 'store_scaling_governor' drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'buf' not described in 'store_scaling_governor' drivers/cpufreq/cpufreq.c:770: warning: Function parameter or member 'count' not described in 'store_scaling_governor' drivers/cpufreq/cpufreq.c:806: warning: Function parameter or member 'policy' not described in 'show_scaling_driver' drivers/cpufreq/cpufreq.c:806: warning: Function parameter or member 'buf' not described in 'show_scaling_driver' drivers/cpufreq/cpufreq.c:815: warning: Function parameter or member 'policy' not described in 'show_scaling_available_governors' drivers/cpufreq/cpufreq.c:815: warning: Function parameter or member 'buf' not described in 'show_scaling_available_governors' drivers/cpufreq/cpufreq.c:859: warning: Function parameter or member 'policy' not described in 'show_related_cpus' drivers/cpufreq/cpufreq.c:859: warning: Function parameter or member 'buf' not described in 'show_related_cpus' drivers/cpufreq/cpufreq.c:867: warning: Function parameter or member 'policy' not described in 'show_affected_cpus' drivers/cpufreq/cpufreq.c:867: warning: Function parameter or member 'buf' not described in 'show_affected_cpus' drivers/cpufreq/cpufreq.c:901: warning: Function parameter or member 'policy' not described in 'show_bios_limit' drivers/cpufreq/cpufreq.c:901: warning: Function parameter or member 'buf' not described in 'show_bios_limit' drivers/cpufreq/cpufreq.c:1625: warning: Function parameter or member 'dev' not described in 'cpufreq_remove_dev' drivers/cpufreq/cpufreq.c:1625: warning: Function parameter or member 'sif' not described in 'cpufreq_remove_dev' drivers/cpufreq/cpufreq.c:2380: warning: Function parameter or member 'cpu' not described in 'cpufreq_get_policy' drivers/cpufreq/cpufreq.c:2771: warning: Function parameter or member 'driver' not described in 'cpufreq_unregister_driver' Signed-off-by: Lee Jones <lee.jones@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-02cpufreq: Remove the weakly defined cpufreq_default_governor()Viresh Kumar
The default cpufreq governor is chosen with the help of a "choice" option in the Kconfig which will always end up selecting one of the governors and so the weakly defined definition of cpufreq_default_governor() will never get called. Moreover, this makes us skip the checking of the return value of that routine as it will always be non NULL. If the Kconfig option changes in future, then we will start getting a link error instead (and it won't go unnoticed as in the case of the weak definition). Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>