summaryrefslogtreecommitdiff
path: root/drivers/powercap/intel_rapl_common.c
AgeCommit message (Collapse)Author
2024-03-13Merge tag 'pm-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "From the functional perspective, the most significant change here is the addition of support for Energy Models that can be updated dynamically at run time. There is also the addition of LZ4 compression support for hibernation, the new preferred core support in amd-pstate, new platforms support in the Intel RAPL driver, new model-specific EPP handling in intel_pstate and more. Apart from that, the cpufreq default transition delay is reduced from 10 ms to 2 ms (along with some related adjustments), the system suspend statistics code undergoes a significant rework and there is a usual bunch of fixes and code cleanups all over. Specifics: - Allow the Energy Model to be updated dynamically (Lukasz Luba) - Add support for LZ4 compression algorithm to the hibernation image creation and loading code (Nikhil V) - Fix and clean up system suspend statistics collection (Rafael Wysocki) - Simplify device suspend and resume handling in the power management core code (Rafael Wysocki) - Fix PCI hibernation support description (Yiwei Lin) - Make hibernation take set_memory_ro() return values into account as appropriate (Christophe Leroy) - Set mem_sleep_current during kernel command line setup to avoid an ordering issue with handling it (Maulik Shah) - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a driver's system suspend callback (Qingliang Li) - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus) - Add a tracepoint for runtime_status changes tracking (Vilas Bhat) - Fix section title markdown in the runtime PM documentation (Yiwei Lin) - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li) - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li) - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada) - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby) - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar) - Change default transition delay in cpufreq to 2ms (Qais Yousef) - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois) - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef) - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar) - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova) - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan) - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois) - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle) - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng) - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang) - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui) - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano) - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li) - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda) - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil) - Fix a couple of warnings in the OPP core code related to W=1 builds (Viresh Kumar) - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar) - Extend dev_pm_opp_data with turbo support (Sibi Sankar) - dt-bindings: drop maxItems from inner items (David Heidelberg)" * tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits) dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support Fix cpupower-frequency-info.1 man page typo cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpuidle: ACPI/intel: fix MWAIT hint target C-state computation PM: sleep: wakeirq: fix wake irq warning in system suspend powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function cpufreq: Don't unregister cpufreq cooling on CPU hotplug PM: suspend: Set mem_sleep_current during kernel command line setup cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max Documentation: PM: Fix runtime_pm.rst markdown syntax cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake ...
2024-02-15x86/cpu/topology: Rename topology_max_die_per_package()Thomas Gleixner
The plural of die is dies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
2024-02-13powercap: intel_rapl: Add support for Arrow LakeSumeet Pawnikar
Add support for Arrow Lake platform to the RAPL common driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13powercap: intel_rapl: Add support for Lunar Lake-M paltformZhang Rui
Add support for Lunar Lake-M platform to the RAPL common driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13powercap: intel_rapl: Fix locking in TPMI RAPLZhang Rui
The RAPL framework uses CPU hotplug locking to protect the rapl_packages list and rp->lead_cpu to guarantee that 1. the RAPL package device is not unprobed and freed 2. the cached rp->lead_cpu is always valid for operations like powercap sysfs accesses. Current RAPL APIs assume being called from CPU hotplug callbacks which hold the CPU hotplug lock, but TPMI RAPL driver invokes the APIs in the driver's .probe() function without acquiring the CPU hotplug lock. Fix the problem by providing both locked and lockless versions of RAPL APIs. Fixes: 9eef7f9da928 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.5+ <stable@vger.kernel.org> # 6.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13powercap: intel_rapl: Fix a NULL pointer dereferenceZhang Rui
A NULL pointer dereference is triggered when probing the MMIO RAPL driver on platforms with CPU ID not listed in intel_rapl_common CPU model list. This is because the intel_rapl_common module still probes on such platforms even if 'defaults_msr' is not set after commit 1488ac990ac8 ("powercap: intel_rapl: Allow probing without CPUID match"). Thus the MMIO RAPL rp->priv->defaults is NULL when registering to RAPL framework. Fix the problem by adding sanity check to ensure rp->priv->rapl_defaults is always valid. Fixes: 1488ac990ac8 ("powercap: intel_rapl: Allow probing without CPUID match") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.5+ <stable@vger.kernel.org> # 6.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug()Ville Syrjälä
Before the refactoring the pr_warn() only triggered when someone explicitly tried to write to a BIOS locked limit. After the refactoring the warning is also triggering during system resume. The user can't do anything about this so printing scary warnings doesn't make sense Keep the printk but make it pr_debug() instead of pr_warn() to make it clear it's not a serious issue. Fixes: 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits support") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: 6.5+ <stable@vger.kernel.org> # 6.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-06powercap: intel_rapl: Fix invalid setting of Power Limit 4Srinivas Pandruvada
System runs at minimum performance, once powercap RAPL package domain enabled flag is changed from 1 to 0 to 1. Setting RAPL package domain enabled flag to 0, results in setting of power limit 4 (PL4) MSR 0x601 to 0. This implies disabling PL4 limit. The PL4 limit controls the peak power. So setting 0, results in some undesirable performance, which depends on hardware implementation. Even worse, when the enabled flag is set to 1 again. This will set PL4 MSR value to 0x01, which means reduce peak power to 0.125W. This will force system to run at the lowest possible performance on every PL4 supported system. Setting enabled flag should only affect the "enable" bit, not other bits. Here it is changing power limit. This is caused by a change which assumes that there is an enable bit in the PL4 MSR like other power limits. Although PL4 enable/disable bit is present with TPMI RAPL interface, it is not present with the MSR interface. There is a rapl_primitive_info defined for non existent PL4 enable bit and then it is used with the commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits support") to enable PL4. This is wrong, hence remove this rapl primitive for PL4. Also in the function rapl_detect_powerlimit(), PL_ENABLE is used to check for the presence of power limits. Replace PL_ENABLE with PL_LIMIT, as PL_LIMIT must be present. Without this change, PL4 controls will not be available in the sysfs once rapl primitive for PL4 is removed. Fixes: 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits support") Suggested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Cc: 6.5+ <stable@vger.kernel.org> # 6.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-28Merge tag 'pm-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework cpuidle governors to call tick_nohz_get_sleep_length() less often and fix one of them, rework hibernation to avoid storing pages filled with zeros in hibernation images, switch over some cpufreq drivers to use void remove callbacks, fix and clean up multiple cpufreq drivers, fix the devfreq core, update the cpupower utility and make other assorted improvements. Specifics: - Rework the menu and teo cpuidle governors to avoid calling tick_nohz_get_sleep_length(), which is likely to become quite expensive going forward, too often and improve making decisions regarding whether or not to stop the scheduler tick in the teo governor (Rafael Wysocki) - Improve the performance of cpufreq_stats_create_table() in some cases (Liao Chang) - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal) - Use clamp() helper macro to improve the code readability in cpufreq_verify_within_limits() (Liao Chang) - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies) - Migrate cpufreq drivers for various platforms to use void remove callback (Yangtao Li) - Add online/offline/exit hooks for Tegra driver (Sumit Gupta) - Explicitly include correct DT includes in cpufreq (Rob Herring) - Frequency domain updates for qcom-hw driver (Neil Armstrong) - Modify AMD pstate driver return the highest_perf value (Meng Li) - Generic cleanups for cppc, mediatek and powernow driver (Liao Chang, Konrad Dybcio) - Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino Del Regno and Konrad Dybcio) - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva) - Add device PM helpers to allow a device to remain powered-on during system-wide transitions (Ulf Hansson) - Rework hibernation memory snapshotting to avoid storing pages filled with zeros in hibernation image files (Brian Geffon) - Add check to make sure that CPU latency QoS constraints do not use negative values (Clive Lin) - Optimize rp->domains memory allocation in the Intel RAPL power capping driver (xiongxin) - Remove recursion while parsing zones in the arm_scmi power capping driver (Cristian Marussi) - Fix memory leak in devfreq_dev_release() (Boris Brezillon) - Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan Sadhasivam) - Explicitly include correct DT includes in devfreq (Rob Herring) - Remove unsued pm_runtime_update_max_time_suspended() extern declaration (YueHaibing) - Add turbo-boost support to cpupower (Wyes Karny) - Add support for amd_pstate mode change to cpupower (Wyes Karny) - Fix 'cpupower idle_set' command to accept only numeric values of arguments (Likhitha Korrapati) - Clean up OPP code and add new frequency related APIs to it (Viresh Kumar, Manivannan Sadhasivam) - Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)" * tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) cpufreq: tegra194: remove opp table in exit hook cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit() cpufreq: tegra194: add online/offline hooks cpuidle: teo: Avoid unnecessary variable assignments cpufreq: qcom-cpufreq-hw: add support for 4 freq domains dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver cpufreq: amd-pstate-ut: Remove module parameter access cpufreq: Use clamp() helper macro to improve the code readability PM: sleep: Add helpers to allow a device to remain powered-on PM: QoS: Add check to make sure CPU latency is non-negative PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended() cpufreq: intel_pstate: set stale CPU frequency to minimum cpufreq: stats: Improve the performance of cpufreq_stats_create_table() dt-bindings: cpufreq: Convert ti-cpufreq to json schema dt-bindings: opp: Convert ti-omap5-opp-supply to json schema OPP: Fix argument name in doc comment cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases. ...
2023-08-28Merge tag 'perf-core-2023-08-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - AMD IBS improvements - Intel PMU driver updates - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events - Micro-optimize software events and the ring-buffer code - Misc cleanups & fixes * tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call perf/x86/intel: Add Crestmont PMU x86/cpu: Update Hybrids x86/cpu: Fix Crestmont uarch x86/cpu: Fix Gracemont uarch perf: Remove unused extern declaration arch_perf_get_page_size() perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA perf/mem: Introduce PERF_MEM_LVLNUM_UNC perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin locking/arch: Avoid variable shadowing in local_try_cmpxchg() perf/core: Use local64_try_cmpxchg in perf_swevent_set_period perf/x86: Use local64_try_cmpxchg perf/amd: Prevent grouping of IBS events
2023-08-09x86/cpu: Fix Gracemont uarchPeter Zijlstra
Alderlake N is an E-core only product using Gracemont micro-architecture. It fits the pre-existing naming scheme perfectly fine, adhere to it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-04Merge back earlier power capping changes for v6.6.Rafael J. Wysocki
2023-08-01powercap: intel_rapl: Optimize rp->domains memory allocationxiongxin
In the memory allocation of rp->domains in rapl_detect_domains(), there is an additional memory of struct rapl_domain allocated, optimize the code here to save sizeof(struct rapl_domain) bytes of memory. Test in Intel NUC (i5-1135G7). Signed-off-by: xiongxin <xiongxin@kylinos.cn> Tested-by: xiongxin <xiongxin@kylinos.cn> Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01powercap: intel_rapl: Fix a sparse warning in TPMI interfaceZhang Rui
Depends on the interface used, the RAPL registers can be either MSR indexes or memory mapped IO addresses. Current RAPL common code uses u64 to save both MSR and memory mapped IO registers. With this, when handling register address with an __iomem annotation, it triggers a sparse warning like below: sparse warnings: (new ones prefixed by >>) >> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected unsigned long long [usertype] *tpmi_rapl_regs @@ got void [noderef] __iomem * @@ drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: expected unsigned long long [usertype] *tpmi_rapl_regs drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: got void [noderef] __iomem * Fix the problem by using a union to save the registers instead. Suggested-by: David Laight <David.Laight@ACULAB.COM> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/ Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Introduce core support for TPMI interfaceZhang Rui
Compared with existing RAPL MSR/MMIO Interface, the RAPL TPMI Interface 1. has per Power Limit register, thus has per Power Limit Lock and Enable bit. 2. doesn't have Power Limit Clamp bit. 3. the Power Limit Lock and Enable bits have different bit offsets. These mean RAPL TPMI Interface needs its own primitive information. RAPL TPMI Interface also has per domain unit register but with a different register layout. This requires a TPMI specific rapl_defaults call to decode the unit register. Introduce the RAPL core support for TPMI Interface. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Introduce RAPL I/F typeZhang Rui
Different RAPL Interfaces may have different primitive information and rapl_defaults calls. To better distinguish this difference in the RAPL framework code, introduce a new enum to represent different types of RAPL Interfaces. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Make cpu optional for rapl_packageZhang Rui
MSR RAPL Interface always removes a rapl_package when all the CPUs in that rapl_package are offlined. This is because it relies on an online CPU to access the MSR. But for RAPL Interface using MMIO registers, when all the cpus within the rapl_package are offlined, 1. the register can still be accessed 2. monitoring and setting the Power Pimits for the rapl_package is still meaningful because of uncore power. This means that, a valid rapl_package doesn't rely on one or more cpus being onlined. For this sense, make cpu optional for rapl_package. A rapl_package can be registered either using a CPU id to represent the physical package/die, or using the physical package id directly. Note that, the thermal throttling interrupt is not disabled via MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment. If it is still needed in the future, this can be achieved by selecting an onlined CPU using the physical package id. Note that, processor_thermal_rapl, the current MMIO RAPL Interface driver, can also be converted to register using a package id instead. But this is not done right now because processor_thermal_rapl driver works on single-package systems only, and offlining the only package will not happen. So keep the previous logic. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Remove redundant cpu parameterZhang Rui
For rapl_packages that rely on online CPUs to work, rp->lead_cpu always has a valid CPU id. Remove the redundant cpu parameter in rapl_check_domain(), rapl_detect_domains() and .check_unit() callbacks. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Add support for lock bit per Power LimitZhang Rui
With RAPL MSR/MMIO Interface, each RAPL domain has one Power Limit register. Each Power Limit register has one lock bit which tells the OS if the power limit register can be used or not. Depending on the number of power limits supported by the power limit register, the lock bit may apply to one or more power limits. With RAPL TPMI Interface, each RAPL domain has multiple Power Limits, and each Power Limit has its own register, with a lock bit. To handle this, introduce support for lock bit per Power Limit. For existing RAPL MSR/MMIO Interfaces, the lock bit in the Power Limit register applies to all the Power Limits controlled by this register. Remove the per domain DOMAIN_STATE_BIOS_LOCKED flag at the same time because it can be replaced by the per Power Limit lock. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Cleanup Power Limits supportZhang Rui
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Introduce a universal way to issue Power Limit operations. Instead of using hardcoded primitive name directly, use Power Limit id + operation type, and hide all the Power Limit difference details in a central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and rapl_write_pl_data(), are introduced at the same time to simplify the code for issuing Power Limit operations. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Use bitmap for Power LimitsZhang Rui
Currently, a RAPL package is registered with the number of Power Limits supported in each RAPL domain. But this doesn't tell which Power Limits are available. Using the number of Power Limits supported to guess the availability of each Power Limit is fragile. Use bitmap to represent the availability of each Power Limit. Note that PL1 is mandatory thus it does not need to be set explicitly by the RAPL Interface drivers. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Change primitive orderZhang Rui
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc. But the same operation for different Power Limit has different primitives because they use different registers/register bits. A lot of dirty/duplicate code was introduced to handle this difference. Instead of using hardcoded primitive name directly, using Power Limit id + operation type is much cleaner. For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the beginning of enum rapl_primitives so that they can be reused as Power Limit ids. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Use index to initialize primitive informationZhang Rui
Currently, the RAPL primitive information array is required to be initialized in the order of enum rapl_primitives. This can break easily, especially when different RAPL Interfaces may support different sets of primitives. Convert the code to initialize the primitive information using array index explicitly. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per domain energy/power/time unitZhang Rui
RAPL MSR/MMIO Interface has package scope unit register but some RAPL domains like Dram/Psys may use a fixed energy unit value instead of the default unit value on certain platforms. RAPL TPMI Interface supports per domain unit register. For the above reasons, add support for per domain unit register and per domain energy/power/time unit. When per domain unit register is not available, use the package scope unit register as the per domain unit register for each RAPL domain so that this change is transparent to MSR/MMIO Interface. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per Interface primitive informationZhang Rui
RAPL primitive information is Interface specific. Although current MSR and MMIO Interface share the same RAPL primitives, new Interface like TPMI has its own RAPL primitive information. Save the primitive information in the Interface private structure. Plus, using variant name "rp" for struct rapl_primitive_info is confusing because "rp" is also used for struct rapl_package. Use "rpi" as the variant name for struct rapl_primitive_info, and rename the previous rpi[] array to avoid conflict. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Support per Interface rapl_defaultsZhang Rui
rapl_defaults is Interface specific. Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks. Save the rapl_defaults information in the Interface private structure. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24powercap: intel_rapl: Allow probing without CPUID matchZhang Rui
Currently, CPU model checks is used to 1. get proper rapl_defaults callbacks for RAPL MSR/MMIO Interface. 2. create a platform device node for the intel_rapl_msr driver to probe. Both of these are only mandatory for the RAPL MSR/MMIO Interface. Make the CPUID match optional. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-13powercap: intel_rapl: Fix handling for large time windowZhang Rui
When setting the power limit time window, software updates the 'y' bits and 'f' bits in the power limit register, and the value hardware takes follows the formula below Time window = 2 ^ y * (1 + f / 4) * Time_Unit When handling large time window input from userspace, using left shifting breaks in two cases: 1. when ilog2(value) is bigger than 31, in expression "1 << y", left shifting by more than 31 bits has undefined behavior. This breaks 'y'. For example, on an Alderlake platform, "1 << 32" returns 1. 2. when ilog2(value) equals 31, "1 << 31" returns negative value because '1' is recognized as signed int. And this breaks 'f'. Given that 'y' has 5 bits and hardware can never take a value larger than 31, fix the first problem by clamp the time window to the maximum possible value that the hardware can take. Fix the second problem by using unsigned bit left shift. Note that hardware has its own maximum time window limitation, which may be lower than the time window value retrieved from the power limit register. When this happens, hardware clamps the input to its maximum time window limitation. That is why a software clamp is preferred to handle the problem on hand. Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Adjusted the comment added by this change ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20powercap: intel_rapl: add support for Emerald RapidsZhang Rui
Add Emerald Rapids to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20powercap: intel_rapl: add support for Meteor LakeZhang Rui
Add Meteor Lake to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-24powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domainZhang Rui
Intel Xeon servers used to use a fixed energy resolution (15.3uj) for Dram RAPL domain. But on SPR, Dram RAPL domain follows the standard energy resolution as described in MSR_RAPL_POWER_UNIT. Remove the SPR dram_domain_energy_unit quirk. Fixes: 2d798d9f5967 ("powercap: intel_rapl: add support for Sapphire Rapids") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21powercap: intel_rapl: fix UBSAN shift-out-of-bounds issueChao Qin
When value < time_unit, the parameter of ilog2() will be zero and the return value is -1. u64(-1) is too large for shift exponent and then will trigger shift-out-of-bounds: shift exponent 18446744073709551615 is too large for 32-bit type 'int' Call Trace: rapl_compute_time_window_core rapl_write_data_raw set_time_window store_constraint_time_window_us Signed-off-by: Chao Qin <chao.qin@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03powercap: intel_rapl: Add support for RAPTORLAKE_SZhang Rui
Add intel_rapl support for RAPTORLAKE_S platform, which behaves the same as RAPTORLAKE and RAPTORLAKE_P platforms. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-05powercap: intel_rapl: Add support for RAPTORLAKE_PGeorge D Sworo
Add RAPTORLAKE_P to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: George D Sworo <george.d.sworo@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19powercap: intel_rapl: remove redundant store to value after multiplyColin Ian King
There is no need to store the result of the multiply back to variable value after the multiplication. The store is redundant, replace *= with just *. Cleans up clang scan build warning: warning: Although the value stored to 'value' is used in the enclosing expression, the value is never actually read from 'value' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18powercap: intel_rapl: add support for ALDERLAKE_NZhang Rui
Add ALDERLAKE_N to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-22powercap: intel_rapl: add support for RaptorLakeZhang Rui
Add intel_rapl support for the RaptorLake platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17powercap: intel_rapl: support new layout of Psys PowerLimit Register on SPRZhang Rui
On Sapphire Rapids, the layout of the Psys domain Power Limit Register is different from from what it was before. Enhance the code to support the new Psys PL register layout. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-and-tested-by: Alkattan Dana <dana.alkattan@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-04powercap: intel_rapl: Replace deprecated CPU-hotplug functionsSebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-18powercap: Add Hygon Fam18h RAPL supportPu Wen
Enable Hygon Fam18h RAPL support for the power capping framework. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-12powercap: intel_rapl: Use topology interface in rapl_init_domains()Yunfeng Ye
It's not a good idea to access the phys_proc_id of cpuinfo directly. Use topology_physical_package_id(cpu) instead. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-12powercap: intel_rapl: Use topology interface in rapl_add_package()Yunfeng Ye
It's not a good idea to access phys_proc_id and cpu_die_id directly. Use topology_physical_package_id(cpu) and topology_die_id(cpu) instead. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-27powercap/intel_rapl: add support for AlderLake MobileZhang Rui
Add intel_rapl support for the AlderLake Mobile platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10powercap: RAPL: Add AMD Fam19h RAPL supportKim Phillips
AMD Family 19h's RAPL MSRs are identical to Family 17h's. Extend Family 17h's support to Family 19h. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Victor Ding <victording@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10powercap: Add AMD Fam17h RAPL supportVictor Ding
Enable AMD Fam17h RAPL support for the power capping framework. The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR. Tested by comparing the results of following two sysfs entries and the values directly read from corresponding MSRs via /dev/cpu/[x]/msr: /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj Signed-off-by: Victor Ding <victording@google.com> Acked-by: Kim Phillips <kim.phillips@amd.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-02powercap/intel_rapl: remove unneeded semicolonTom Rix
A semicolon is not needed after a switch statement. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-16powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL ↵Zhang Rui
domain On multi-package systems, the Psys MSR is only valid for CPUs on specific package (master package). The current code makes the assumption that package 0 is the master package, but this is not true on new platforms like SPR. Fix the problem by emuerating the Psys RAPL domain for every package, so CPUs in slave packages will read 0 for the Psys energy counter and only CPUs in master packages can get a valid reading and register the Psys RAPL domain. The sysfs I/F for the Psys RAPL domain is not changed. Signed-off-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-16powercap/intel_rapl: Fix domain detectionZhang Rui
As only the low 32 bits of the RAPL_DOMAIN_REG_STATUS register represents the energy counter, and the high 32 bits are reserved, detect the existence of a RAPL domain by checking the low 32 bits only. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-16powercap: RAPL: Add support for LakefieldRicardo Neri
Simply add Lakefield model ID. No additional changes are needed. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> [ rjw: Minor subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-10powercap/intel_rapl: add support for AlderLakeZhang Rui
Add intel_rapl support for the AlderLake platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>